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3 1 GENOMIC PROMOTER REGION AND POLYMERASE GENE 
MUTATIONS RESPONSIBLE FOR ATTENUATION IN VIRUSES 
OF THE ORDER DESIGNATED MONONEGAVI RALES 

Field Of The Invention 

This invention relates to isolated, 
recombinantly-generated, attenuated, nonsegmented, 
negative-sense , single stranded RNA viruses of the 
Order designated Mononegavi rales having at least one 
attenuating mutation in the 3 1 genomic promoter region 
and having at least one attenuating mutation in the RNA 
polymerase gene. This invention was made with 
Government support under a grant awarded by the Public 
Health Service, The Government has certain rights in 
the invention . 



Enveloped, negative -sense, single stranded 
RNA viruses are uniquely organized and expressed. The 
genomic RNA of negative-sense, single stranded viruses 
serves two template functions in the context of a 
nucleocapsld: as a template for the synthesis of 
messenger RNAs (mRNAs) and as a template for the 
synthesis of the antigenome (+) strand. Negative- 
sense, single stranded RNA viruses encode and package 
their own RNA dependent RNA Polymerase. Messenger RNAs 
are only synthesized once the virus has been uncoated 
in the infected cell. Viral replication occurs after 
synthesis of the mRNAs and requires the continuous 
synthesis of viral proteins. The newly synthesized 
antigenome (+) strand serves as the template for 
generating further copies of the (-) strand genomic 



RNA. 
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The polymerase complex actuates and achieves 
transcription and replication by engaging the cis- 
acting signals at the 3 1 end of the genome, in 
particular, the promoter region . Viral genes are then 
transcribed from the genome template unidirectionally 
from its 3' to its 5» end. There is always less mRNA 
made from the downstream genes (e.g., the polymerase 
gene (L) ) relative to their upstream neighbors (i.e., 
the nucleoprotein gene (N) ) . Therefore, there is always 
a gradient of mRNA abundance according to the position 
of the genes relative to the 3»-end of the genome. 

Based on the revised reclassification in 1993 
by the International Committee on the Taxonomy of 
Viruses, an Order, designated Mononegavirales, has been 
established. This Order contains three families of 
enveloped viruses with single stranded, nonsegmented 
RNA genomes of minus polarity (negative-sense) . These 
families are the Paramyxoviridae, Rhabdoviridae and 
Filoviridae. The family Paramyxoviridae has been 
further divided into two subfamilies, Paramyxovirinae 
and Pneumovirinae. The subfamily Paramyxovirinae 
contains three genera, Paramyxovirus, J?uJbu!avirus and 
Morbilli virus. The subfamily Pneumovirinae contains 
the genus Pneumovirus . 

The new classification is based upon 
morphological criteria, the organization of the viral 
genome, biological activities and the sequence 
relationships of the proteins. The morphological 
distinguishing feature among enveloped viruses for the 
subfamily Paramyxovirinae is the size and shape of the 
nucleocapsids (diameter 18mm, 1mm in length, pitch of 
5.5 nm) , which have a left-handed helical symmetry. The 
biological criteria are: 1) antigenic cross-reactivity 
between members of a genus, and 2) the presence of 
neuraminidase activity in the genera Paramyxovirus, 
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Rubulavirus and its absence in genus Mo^tllivirus . In 
addition, variations in the coding potential of the P 
gene are considered, as is the presence of an extra 
gene (SH) in Rubulaviruses. 
5 Pneumoviruses can be distinguished from 

Paramyxovirinae morphologically because they contain 
narrow nucleocapsids . In addition, pneumoviruses have 
major differences in the number of protein- encoding 
cistrons (10 in pneumoviruses versus 6 in 

10 Paramyxovirinae) and an attachment protein (G) that is 

very different from that of Paramyxovirinae. Although 
the paramyxoviruses and pneumoviruses have six proteins 
that appear to correspond in function (N, P, M, G/H/HN, 
F and L) , only the latter two proteins exhibit 

15 significant sequence relatedness between the two 

subfamilies. Several pneumoviral proteins lack 
counterparts in most of the paramyxoviruses, namely the 
nonstructural proteins NS1 and NS2, the small 
hydrophobic protein SH, and a second protein M2 . Some 

20 paramyxoviral proteins, namely C and V, lack 

counterparts in pneumoviruses. However, the basic 
genomic organization of pneumoviruses and 
paramyxoviruses is the same. The same is true of 
rhabdoviruses and filoviruses. Table 1 presents the 

25 current taxonomical classification of these viruses, 

together with examples of each genus. 

Table 1 

Classification of Nonsegmented, negative-sense, single 
30 stranded RNA Viruses of the Order Mononegavirales 

Family Paramyxoviridae 

Subfamily Paramyxovirinae 
Genus Paramyxovirus 

Sendai virus (mouse parainfluenza virus 
35 type 1) 
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Human parainfluenza virus (PIV) types 1 
and 3 

Bovine parainfluenza virus (BPV) type 3 
Genus Rubulavirus 
5 Simian virus 5 (SV) (Canine 

parainfluenza virus type 2) 
Mumps virus 

Newcastle disease virus (NDV) (avian 
Paramyxovirus 1) 
10 Human parainfluenza virus types 2, 4a 

and 4 b 
Genus Morbillivirus 

Measles virus (MV) 
Dolphin Morbillivirus 
15 Canine distemper virus (CDV) 

Peste-des-petits-ruminants virus 
Phocine distemper virus 
Rinderpest virus 
Subfamily Pneumovirinae 
20 Genus Pneumovirus 

Human respiratory syncytial virus (RSV) 
Bovine respiratory syncytial virus 
Pneumonia virus of mice 
Turkey rhinotracheitis virus 
25 Family Rhabdoviridae 

Genus Lyssavirus 

Rabies virus 
Genus Vesiculovirus 

Vesicular stomatitis virus 
30 Genus Ephemerovirus 

Bovine ephemeral fever virus 
Family Filovirdae 

Genus Filovirus 

Marburg virus 

35 
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For many of these viruses, no vaccines of any 
kind are available. Thus, there is a need to develop 
vaccines against such human and animal pathogens. Such 
vaccines would have to elicit a protective immune 
5 response in the recipient. The qualitative and 

quantitative features of such a favorable response are 
extrapolated from those seen in survivors of natural 
virus infection, who, in general, are protected from 
reinfection by the same or highly related viruses for 

10 some significant duration thereafter. 

A variety of approaches can be considered in 
seeking to develop such vaccines, including the use of: 
(1) purified individual viral protein vaccines (subunit 
vaccines); (2) inactivated whole virus preparations; 

15 and (3) live, attenuated viruses. 

Subunit vaccines have the desirable feature 
of being pure, definable and relatively easily produced 
in abundance by various means, including recombinant 
DNA expression methods. To date, with the notable 

20 exception of hepatitis B surface antigen, viral subunit 

vaccines have generally only elicited short-lived 
and/or inadequate immunity, particularly in naive 
recipients . 

Formalin inactivated whole virus preparations 
25 of polio (IPV) and hepatitis A have proven safe and 

efficacious. In contrast, immunization with similarly 
inactivated whole viruses such as respiratory syncytial 
virus and measles virus vaccines elicited unfavorable 
immune responses and/or response profiles which 
30 predisposed vaccinees to exaggerated or aberrant 

disease when subsequently confronted with the natural 
or "wild- type" virus. 

Early attempts (1966) to vaccinate young 
children using a parenterally administered formalin - 
35 inactivated RSV vaccine. Unfortunately, several field 
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trials of this vaccine revealed serious adverse 
reactions the development of a severe illness with 
unusual features following subsequent natural infection 
with RSV (Bibliography entries 1,2). It has been 
5 suggested that this f ormalinized RSV antigen elicited 

an abnormal or unbalanced immune response profile, 
predisposing the vaccinee to RSV disease (3,4). 

Thereafter, live, attenuated RSV vaccine 
candidates were generated by cold passage or chemical 

10 mutagenesis. These RSV strains were found to have 

reduced virulence in seropositive adults. 
Unfortunately, they proved either over or under- 
attenuated when given to seronegative infants; in some 
cases, they also were found to lack genetic stability 

15 (5,6). Another vaccination approach using parenteral 

administration of live virus was ineffective and 
efforts along this line were discontinued (7) . 
Notably, these live RSV vaccines were never associated 
with disease enhancement as observed with the formalin- 

20 inactivated RSV vaccine described above. Currently, 

there are no RSV vaccines approved for administration 
to humans, although clinical trials are now in progress 
with cold-passaged, chemically mutagenized strains of 
RSV designated A2 and B-l. 

25 Appropriately attenuated live derivatives of 

wild- type viruses offer a distinct advantage as vaccine 
candidates. As live, replicating agents, they initiate 
infection in recipients during which viral gene 
products are expressed, processed and presented in the 

30 context of the vaccinee' s specific MHC class I and II 

molecules, eliciting humoral and cell-mediated immune 
responses, as well as the coordinate cytokine patterns, 
which parallel the protective immune profile of 
survivors of natural infection. 



SUBSTITUTE SHEET (RULE 26) 



WO 98/13501 



PCT/US97/16718 



- 7 



This favorable immune response pattern is 
contrasted with the delimited responses elicited by 
inactivated or subunit vaccines, which typically are 
largely restricted to the humoral immune surveillance 
arm. Further, the immune response profile elicited by 
some formal in inactivated whole virus vaccines, e.g., 
measles and respiratory syncytial virus vaccines 
developed in the 1960*8, have not only failed to 
provide sustained protection, but in fact have led to a 
predisposition to aberrant, exaggerated, and even fatal 
illness, when the vaccine recipient later confronted 
the wild- type virus. 

While live, attenuated viruses have highly 
desirable characteristics as vaccine candidates, they 
have proven to be difficult to develop. The crux of 
the difficulty lies in the need to isolate a derivative 
of the wild- type virus which has lost its disease- 
producing potential (i.e., virulence), while retaining 
sufficient replication competence to infect the 
recipient and elicit the desired immune response 
profile in adequate abundance. 

Historically, this delicate balance between 
virulence and attenuation has been achieved by serial 
passage of a wild- type viral isolate through different 
host tissues or cells under varying growth conditions 
(such as temperature) . This process presumably favors 
the growth of viral variants (mutants) , some of which 
have the favorable characteristic of attenuation. 
Occasionally, further attenuation is achieved through 
chemical mutagenesis as well. 

This propagation/passage scheme typically 
leads to the emergence of virus derivatives which are 
temperature sensitive, cold-adapted and/or altered in 
their host range -- one or all of which are changes 
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from the wild- type, disease-causing viruses i.e., 
changes that may be associated with attenuation. 

Several live virus vaccines, including those 
for the prevention of measleB and mumps (which are 
5 paramyxoviruses) , and for protection against polio and 

rubella (which are positive strand RNA viruses), have 
been generated by this approach and provide the 
mainstay of current childhood immunization regimens 
throughout the world. 

10 Nevertheless, this means for generating 

attenuated live virus vaccine candidates is lengthy 
and, at best, unpredictable, relying largely on the 
selective outgrowth of those randomly occurring genomic 
mutants with desirable attenuation characteristics. 

15 The resulting viruses may have the desired phenotype in 

vitro, and even appear to be attenuated in animal 
models. However, all too often they remain either 
under- or overattenuated in the human or animal host 
for whom they are intended as vaccine candidates. 

20 Even as to current vaccines in use, there is 

still a need for more efficacious vaccines. For 
example, the current measles vaccines provide 
reasonably good protection. However, recent measles 
epidemics suggest deficiencies in the efficacy of 

25 current vaccines. Despite maternal immunization, high 

rates of acute measles infection have occurred in 
children under age one, reflecting the vaccines 1 
inability to induce anti-measles antibody levels 
comparable to those developed following wild- type 

30 measles infection (8,9,10). As a result, vaccine- 

immunized mothers are less able to provide their 
infants with sufficient transplacentally-derived 
passive antibodies to protect the newborns beyond the 
first few months of life. 
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Acute measles infections in previously 
immunized adolescents and young adults point to an 
additional problem. These secondary vaccine failures 
indicate limitations in the current vaccines 1 ability 
5 to induce and maintain antiviral protection that is 

both abundant and long-lived (11,12,13). Recently, yet 
another potential problem was revealed. The 
hemagglutinin protein of wild- type measles isolated 
over the past 15 years has shown a progressively 

10 increasing distance from the vaccine strains (14) . 

This "antigenic drift" raises legitimate concerns that 
the vaccine strains may not contain the ideal antigenic 
repertoire needed to provide optimal protection. Thus, 
there is a need for improved vaccines. 

15 Rational vaccine design would be assisted by 

a better understanding of these viruses, in particular, 
by the identification of the virally encoded 
determinants of virulence as well as those genomic 
changes which are responsible for attenuation. 



20 



Summary Of The Invention 



Accordingly, it is an object of this 
invention to identify those regions of the genome of 
25 the RNA viruses of the Order Mononegavirales where 

mutations result in attenuation of those viruses. 

It is a further object of this invention to 
produce recombinantly-generated viruses which 
incorporate such attenuating mutations in their 
30 genomes . 

It is still a further object of this 
invention to formulate vaccines containing such 
attenuated viruses. 

These and other objects of the invention as 
35 discussed below are achieved by the generation and 
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isolation of recombinantly-generated, attenuated, 
nonsegmented, negative- sense, single stranded RNA 
viruses of the Order Mononegavirales having at least 
one attenuating mutation in the 3 1 genomic promoter 
5 region and having at least one attenuating mutation in 

the RNA polymerase gene. 

In the case of measles virus, at least one 
attenuating mutation in the 3 1 genomic promoter region 
is selected from the group consisting of nucleotide 26 

10 (A — ► T) , nucleotide 42 (A — » T or A — > C) and 

nucleotide 96 (G -> A) , where these nucleotides, as 
well as others delineated in this application (unless 
stated otherwise) , are presented in positive strand, 
antigenomic, that is, message (coding) sense, and at 

15 least one attenuating mutation in the RNA polymerase 

gene is selected from the group consisting of 
nucleotide changes which produce changes in an amino 
acid selected from the group consisting of residues 331 
(isoleucine -> threonine), 1409 (alanine -> threonine), 

20 1624 (threonine -» alanine) , 1649 (arginine -> 

methionine), 1717 (aspartic acid -> alanine), 1936 
(histidine -> tyrosine) , 2074 (glutamine ~> arginine) 
and 2114 (arginine -» lysine) . 

In the case of human parainfluenza virus type 

25 3, at least one attenuating mutation in the 3» genomic 

promoter region is selected from the group consisting 
of nucleotide 23 (T — » C) , nucleotide 24 (C — » T) , 
nucleotide 28 (G -» T) and nucleotide 45 (T — » A) , and 
at least one attenuating mutation in the RNA polymerase 

30 gene is selected from the group consisting of 

nucleotide changes which produce changes in an amino 
acid selected from the group consisting of residues 942 
(tyrosine -> histidine), 992 (leucine -> 
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phenylalanine), 1292 (leucine -» phenylalanine), and 
1558 (threonine isoleucine) . 

In the case of human respiratory syncytial 
virus subgroup B, at least one attenuating mutation in 
5 the 3 1 genomic promoter region is selected from the 

group consisting of nucleotide 4 (C G) and the 
insertion of an additional A in the stretch of A f s at 
nucleotides 6-11, and at least one attenuating mutation 
in the RNA polymerase gene is selected from the group 

10 consisting of nucleotide changes which produce changes 

in an amino acid selected from the group consisting of 
residues 353 (arginine -> lysine) , 451 (lysine 
arginine) , 1229 (aspartic acid -> asparagine) , 2029 
(threonine -» isoleucine) and 2050 (asparagine -> 

15 aspartic acid) . 

In another embodiment of this invention, 
attenuated virus is used to prepare vaccines which 
elicit a protective immune response against the wild- 
type form of the virus. 

20 In yet another embodiment of this invention, 

an isolated, positive strand, antigenomic message sense 
nucleic acid molecule (or an isolated, negative strand 
genomic sense nucleic acid molecule) having the 
complete viral nucleotide sequence (whether of wild- 

25 type virus or virus attenuated by non- recombinant 

means) is manipulated by introducing one or more of the 
attenuating mutations described in this application to 
generate an isolated, recombinantly-generated 
attenuated virus. This virus is then used to prepare 

30 vaccines which elicit a protective immune response 

against the wild- type form of the virus. 

In still another embodiment of this 
invention, such a complete wild- type or vaccine viral 
nucleotide sequence is used: (1) to design PCR primers 

35 for use in a PCR assay to detect the presence of the 



SUBSTITUTE SHEET (RULE 26) 



WO 98/13501 PCT/US97/16718 



- 12 



corresponding virus in a sample; or (2) to design and 
select peptides for use in an ELISA to detect the 
presence of the corresponding virus in a sample. 

5 Brief Description Of The Figures 

Figure 1 depicts the passage history of the 
Edmonston measles virus (15) . The abbreviations have 
the following meanings: HK - human kidney; HA - human 

10 amnion; CE(am) - chick embryo; CEF - chick embryo 

fibroblast; DK - dog kidney; WI-38 - human diploid 
cells; SK - sheep kidney; * - plaque cloning. The 
number following each abbreviation represents the 
number of passages. 

15 Figure 2 depicts a map of the measles virus 

genome showing putative cis- acting regulatory elements 
at and near the genome and antigenome termini. Top - a 
schematic map of the measles virus genome, beginning at 
the 3' end with 52 nucleotides of leader sequence (1) 

20 and ending at the 5" terminus with 37 nucleotides of 

trailer sequence (t) . Gene boundaries are denoted by 
vertical bars; below each gene is the number of 
cistronic nucleotides. Bottom - an expanded schematic 
view of the 3 1 extended genomic promoter regions of 

25 genome and antigenome, showing the position and 

sequence of the two highly conserved domains, A and B. 
The intervening intergenic trinucleotide is denoted as 
well. Nascent 5 1 RNAs encompassing the A 1 to B' 
regions are presumed to contain the regulatory sequence 

30 at which the N protein encapsidation initiates. 

Figure 3 depicts a genetic map of the RSV 
subgroup B wild- type strains designated 2B and 18537 
(top portion) , the intergenic sequences of those 
strains (middle portion) and the 68 nucleotide overlap 

35 between the M2 and L genes (bottom portion) . The RSV 
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2B stain has six fewer nucleotides in the G gene, 
encoding two fewer amino acid residues in the G 
protein, as compared to the 18537 strain. The 2B 
strain has 145 nucleotides in the 5' trailer region, as 
5 compared to 149 nucleotides in the 18537 strain. The 

2B strain has one more nucleotide in each of the NS-1, 
NS-2 and N genes, and one fewer nucleotide in each of 
the M and F genes, as compared to the 18537 strain. 

10 Detailed Description Of The Invention 

Transcription and replication of negative- 
sense, single stranded RNA viral genomes are achieved 
through the enzymatic activity of a multimeric protein 

15 acting on the ribonucleoprotein core (nucleocapsid) . 

Naked genomic RNA cannot serve as a template. Instead, 
these genomic sequences are recognized only when they 
are entirely encapsidated by the N protein into the 
nucleocapsid structure. It is only in that context 

20 that the genomic and antigenomic terminal promoter 

sequences are recognized to initiate the 
transcriptional or replication pathways. 

All paramyxoviruses require the two viral 
proteins, L and P, for these polymerase pathways to 

25 proceed. The pneumoviruses, including RSV, also 

require the transcription elongation factor, M2, for 
the transcriptional pathway to proceed efficiently. 
Additional cof actors may also play a role, including 
perhaps the virus -encoded NS1 and NS2 proteins, as well 

30 as perhaps host-cell encoded proteins. 

However, considerable evidence indicates that 
it is the L protein which performs most, if not all, 
the enzymatic processes associated with transcription 
and replication, including initiation, and termination 

35 of ribonucleotide polymerization, capping and 
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polyadenylation of mRNA transcripts, methylation and 
perhaps specific phosphorylation of P proteins. The L 
protein's central role in genomic transcription and 
replication is supported by its large size, sensitivity 
5 to mutations, and its catalytic level of abundance in 

the transcriptionally active viral complex (16) . 

These considerations led to the proposal that 
L proteins consist of a linear array of domains whose 
concatenated structure integrates discrete functions 

10 (17) . Indeed, three such delimited, discrete elements 

within the negative- sense virus L protein have been 
identified based on their relatedness to defined 
functional domains of other well-characterized 
proteins. These include: (1) a putative RNA template 

15 recognition and/or phosphodiester bond formation 

domain; (2) an RNA binding element; and (3) an ATP 
binding domain. All prior studies of L proteins of 
nonsegmented negative-sense, single stranded RNA 
viruses have revealed these putative functional 

20 elements (17) . 

Without being bound by the following, it is 
reasonable to presume that these non-protein coding, 
promoter and other cis -acting genomic regulatory 
domains are important determinants of the efficiency 

25 with which transcription and replication by measles 

virus (MV) and other viruses of the Order 
Mononegavirales are actualized, in association with the 
L protein, and that they may therefore be virulence 
determinants for these viruses as well. 

30 In summary, the invention is believed to 

encompass a coordinate set of changes between the cis- 
acting regulatory signal (3 1 genomic promoter region) 
and the polymerase gene (L) which results in 
attenuation of the virus while retaining sufficient 

35 ability of the virus to replicate. Attenuation is 
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optimized by rational mutations of the 3' genomic 
promoter region and the polymerase gene, which provide 
the desired balance of replication efficiency: so that 
the virus vaccine is no longer able to produce disease, 
yet retains its capacity to infect the vaccinee's 
cells, to express sufficiently abundant gene products 
to elicit the full spectrum and profile of desirable 
immune responses, and to reproduce and disseminate 
sufficiently to maximize the abundance of the immune 
response elicited. 

Without being bound by the following, 
attenuating mutations in the extended promoter (3 1 
genomic promoter region) and in the polymerase gene are 
believed to affect the display of cis-acting signals 
and the conformation of the polymerase complex engaging 
these signals. For example, when encapsidated, the 
promoter RNA is coiled in a helical array. Changes in 
promoter sequence may affect the relative positions at 
which the conserved signals are displayed relative to 
one another. Specifically, the measles wild- type 3' 
genomic promoter region has a pyrimidine (uracil) at 
positions 26 and 42 (the antigenomic message sense 
sequences have the purine adenine) . The vaccine 
strains have purines at those positions (the 
antigenomic message sense sequences have the 
corresponding pyrimidines; see Table 3 in Example 1 
below) . The larger purines may change the distance 
and/or angular display between the conserved domains of 
the promoter (e.g, in measles, positions 1-11 and 87- 
98) , resulting in an altered spatial presentation of 
the cis-acting signals to the polymerase. 

Animal studies have demonstrated a decrease 
in viral replication sufficient to avoid illness but 
adequate to elicit the desired immune response. This 
likely represents a decrease in transcription, a 
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decrease in gene expression of virally encoded 
proteins, a decrease in antisense templates and, 
therefore, the production of fewer new genomes. The 
resulting attenuated viruses are significantly less 
5 virulent than the wild- type. 

The attenuating mutations described herein 
may be introduced into viral strains by two methods: 

(1) Conventional means such as chemical 
mutagenesis during virus growth in cell cultures to 

10 which a chemical mutagen has been added, selection of 

virus that has been subjected to passage at suboptimal 
temperature in order to select temperature sensitive 
and/or cold adapted mutations, identification of mutant 
virus that produce small plaques in cell culture, and 

15 passage through heterologous hosts to select for host 

range mutations. These viruses are then screened for 
attenuation of their biological activity in an animal 
model. Attenuated viruses are subjected to nucleotide 
sequencing of their 3 ' genomic promoter region and 

20 polymerase genes to locate the sites of attenuating 

mutations. Once this has been done, method (2) is then 
carried out. 

(2) A preferred means of introducing 
attenuating mutations comprises making predetermined 

25 mutations using site-directed mutagenesis. These 

mutations are identified either by method (1) or by 
reference to closely-related viruses whose attenuating 
mutations are already known. One or more mutations are 
introduced into each of the 3 1 genomic promoter region 

30 and the polymerase gene. Cumulative effects of 

different combinations of coding and non-coding changes 
can also be assessed. 

The mutations to the 3 1 genomic promoter 
region and polymerase gene are introduced by standard 

35 recombinant DNA methods into a DNA copy of the viral 
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genome. This may be a wild- type or a modified viral 
genome background (such as viruses modified by method 
(1)), thereby generating a new virus. Infectious 
clones or particles containing these attenuating 
5 mutations are generated using the cDNA "rescue" system, 

which has been applied to a variety of viruses, 
including Sendai virus (18); measles virus (19); 
respiratory syncytial virus (20) ; rabies (21) ; 
vesicular stomatitis virus (VSV) (15) ; and rinderpest 

10 virus (23); these references are hereby incorporated by 

reference. See, for measles virus rescue, published 
International patent application WO 97/06270, 
designating the United States (24) ; for PIV-3 rescue, 
U.S. provisional patent application 60/047575 (25); for 

15 RSV rescue, published International patent application 

WO 97/12032, designating the United States (26); these 
applications are hereby incorporated by reference. 

Briefly, all Mononegavi rales rescue systems 
can be summarized as follows: Each requires a cloned 

20 DNA equivalent of the entire viral genome placed 

between a suitable DNA-dependent RNA polymerase 
promoter (e.g., the T7 RNA polymerase promoter) and a 
self -cleaving ribozyme sequence (e.g., the hepatitis 
delta ribozyme) which is inserted into a propagatable 

25 bacterial plasmid. This transcription vector provides 

the readily manipulable DNA template from which the RNA 
polymerase (e.g., T7 RNA polymerase) can faithfully 
transcribe a single- stranded RNA copy of the viral 
antigenome (or genome) with the precise, or nearly 

30 precise, 5' and 3' termini. The orientation of the 

viral genomic DNA copy and the flanking promoter and 
ribozyme sequences determine whether antigenome or 
genome RNA equivalents are transcribed. Also required 
for rescue of new virus progeny are the virus- specif ic 

35 trans-acting proteins needed to encapsidate the naked, 
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single- stranded viral antigenome or genome RNA 
transcripts into functional nucleocapsid templates: 
the viral nucleocapsid (N or NP) protein , the 
polymerase-associated phosphoprotein (P) and the 
5 polymerase (L) protein. These proteins comprise the 

active viral RNA-dependent RNA polymerase which must 
engage this nucleocapsid template to achieve 
transcription and replication. 

The trans -acting proteins required for 

10 measles virus rescue are the encapsidating protein N, 

and the polymerase complex proteins, P and L. For PIV- 
3, the encapsidating protein is designated NP, and the 
polymerase complex proteins are also referred to as P 
and L . For RSV, the virus -specific trans-acting 

15 proteins include N, P and L, plus an additional 

protein, M2, the RSV- encoded transcription elongation 
factor. 

Typically, these viral trans-acting proteins 
are generated from one or more plasmid expression 

20 vectors encoding the required proteins, although some 

or all of the required trans-acting proteins may be 
produced within mammalian cells engineered to contain 
and express these virus-specific genes and gene 
products as stable transf ormants . 

25 The typical (although not necessarily 

exclusive) circumstances for rescue include an 
appropriate mammallian cell milieu in which T7 
polymerase is present to drive transcription of the 
antigenomic (or genomic) single- stranded RNA from the 

30 viral genomic cDNA- containing transcription vector. 

Either cotranscriptionally or shortly thereafter, this 
viral antigenome (or genome) RNA transcript is 
encapsidated into functional templates by the 
nucleocapsid protein and engaged by the required 

35 polymerase components produced concurrently from co- 
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transfected expression plasmids encoding the required 
virus-specific trans-acting proteins. These events and 
processes lead to the prerequisite transcription of 
viral mRNAs, the replication and amplification of new 
5 genomes and, thereby, the production of novel viral 

progeny, i.e., rescue. 

For the rescue of rabies, VSV and Sendai, T7 
polymerase is provided by recombinant vaccinia virus 
VTF7-3. This system, however, requires that the 

10 rescued virus be separated from the vaccinia virus by 

physical or biochemical means or by repeated passaging 
in cells or tissues that are not a good host for 
poxvirus. For MV cDNA rescue, this requirement is 
avoided by creating a cell line that expresses T7 

15 polymerase, as well as viral N and P proteins. Rescue 

is achieved by transfecting the genome expression 
vector and the L gene expression vector into the helper 
cell line. Advantages of the host-range mutant of the 
vaccinia virus, MVA-T7, which expresses the T7 RNA 

20 polymerase, but does not replicate in mammalian cells, 

are exploited to rescue RSV, Rinderpest virus and MV. 
After simultaneous expression of the necessary 
encapsidating proteins, synthetic full length 
antigenomic viral RNA are encapsidated, replicated and 

25 transcribed by viral polymerase proteins and replicated 

genomes are packaged into infectious virions. In 
addition to such antigenomes, genome analogs have now 
been successfully rescued for Sendai and PIV-3 (25,27). 

The rescue system thus provides a composition 

JO which comprises a transcription vector comprising an 

isolated nucleic acid molecule encoding a genome or 
antigenome of a nonsegmented, negative- sense, single 
stranded RNA virus of the Order Mononegavi rales having 
at least one attenuating mutation in the 3 1 genomic 

55 promoter region and having at least one attenuating 
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mutation in the RNA polymerase gene, together with at 
least one expression vector which comprises at least 
one isolated nucleic acid molecule encoding the trans- 
acting proteins necessary for encapsidation, 
5 transcription and replication (e.g., N, P and L for 

measles virus; NP, P and L for PIV-3; N, P, L and M2 
for RSV) . Host cells are then transformed or 
transfected with the at least two expression vectors 
just described. The host cells are cultured under 
10 conditions which permit the co-expression of these 

vectors so as to produce the infectious attenuated 
virus . 

The rescued infectious virus is then tested 
for its desired phenotype (temperature sensitivity, 

15 cold adaptation, plaque morphology, and transcription 

and replication attenuation), first by in vitro means. 
The mutations at the cis-acting 3* genomic promoter 
region are also tested using the minireplicon system 
where the required trans -acting encapsidation and 

20 polymerase activities are provided by wild- type or 

vaccine helper viruses, or by plasmids expressing the 
N, P and different L genes harboring gene- specif ic 
attenuating mutations (19,28). 

If the attenuated phenotype of the rescued 

25 virus is present, challenge experiments are conducted 

with an appropriate animal model. Non-human primates 
provide the preferred animal model for the pathogenesis 
of human disease* These primates are first immunized 
with the attenuated, recombinantly-generated virus, 

30 then challenged with the wild- type form of the virus. 

Monkeys are infected by various routes, including but 
not limited to intranasal, intratracheal or 
subcutaneous routes of inoculation (29) . 
Experimentally infected rhesus and cynomolgus macaques 

35 have also served as animal models for studies of 
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vaccine -induced protection against measles (30) • 
Protection is measured by such criteria as disease 
signs and symptoms, survival, virus shedding and 
antibody titers. If the desired criteria are met, the 
5 attenuated, recombinantly-generated virus is considered 

a viable vaccine candidate for testing in humans. The 
"rescued" virus is considered to be "recombinantly- 
generated" , as are the progeny and later generations of 
the virus, which also incorporate the attenuating 
10 mutations. 

Even if a "rescued virus is underattenuated 
or overattenuated relative to optimum levels for 
vaccine use, this is information which is valuable for 
developing such optimum strains. 

15 Optimally, a codon containing an attenuating 

point mutation may be stabilized by introducing a 
second or a second plus a third mutation in the codon 
without changing the amino acid encoded by the codon 
bearing only the attenuating point mutation. 

20 Infectious virus clones containing the attenuating and 

stabilizing mutations are also generated using the cDNA 
"rescue" system described above. 

Measles virus serves as a useful model for 
this invention, because sequence data are now available 

25 as described herein for the disease-causing wild- type 

virus and for the disease-preventing vaccines which 
have a demonstrated history of efficacy. 

Measles virus was first isolated in tissue 
culture in 1954 (31) from an infected patient named 

30 David Edmonston. This Edmonston strain of measles 

became the progenitor for many live-attenuated measles 
vaccines including Moraten, which is the current 
vaccine in the United States (Attenuvax™; Merck Sharp & 
Dohme, West Point, PA) and was licensed in 1968 and has 

35 proven to be efficacious. 
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Aggressive immunization programs instituted 
in the mid to late 1960s resulted in the precipitous 
drop in reported measles cases from near 700,000 in 
1965 to 1500 in 1983. In parallel, other vaccine 
5 strains were also developed from the Edmonston strain 

(see Pig. 1), Schwarz (Institut Merieux, Lyon, France), 
Zagreb (Zagreb, Yugoslavia) and AIK-C (Japan) . These 
other vaccines have also proven to be efficacious and 
have been used extensively. An early, reactogenic, 

10 underattenuated vaccine strain (Rubeovax™: Merck Sharp 

& Dohme) produced measles -like illness in children and 
its use thus was discontinued. It, however, was 
further attenuated successfully to produce the Moraten 
vaccine strain (see Fig. 1) (32) . Live measles virus 

15 vaccine provides a success story of the development of 

an efficacious vaccine and provides a model for 
understanding the molecular mechanisms of viral vaccine 
attenuation among nonsegmented, negative-sense, single 
stranded RNA viruses. 

20 Because of its significance as a major cause 

of human morbidity and mortality, measles virus (MV) 
has been quite extensively studied. MV is a large, 
relatively spherical, enveloped particle composed of 
two compartments, a lipoprotein membrane and a 

25 ribonucleoprotein particle core, each having distinct 

biological functions (33) . The virion envelope is a 
host cell-derived plasma membrane modified by three 
virus -specified proteins: The hemagglutinin (H; 
approximately 80 kilodaltons (kD) ) and fusion (F x 2 ; 

30 approximately 60 kD) glycoproteins project on the 

virion surface and confer host cell attachment and 
entry capacities to the viral particle (16) . 
Antibodies to H and/or F are considered protective 
since they neutralize the virus' ability to initiate 

35 infection (34,35,36). The matrix (M; approximately 37 
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kD) protein is the amphipathic protein lining the 
membrane's inner surface, which is thought to 
orchestrate virion morphogenesis and thus consummate 
virus reproduction (37) . The virion core contains the 
15,894 nucleotide long genomic RNA upon which template 
activity is conferred by its intimate association with 
approximately 2600 molecules of the approximately 60 kD 
nucleocapsid (N) protein (38,39,40). Loosely 
associated with this approximately one micron long 
helical ribonucleoprotein particle are enzymatic levels 
of the viral RNA dependent RNA polymerase (L; 
approximately 240 kD) which in concert with the 
polymerase cofactor (P; approximately 70 kD) , and 
perhaps yet other virus-specified as well as 
15 host-encoded proteins, transcribes and replicates the 

MV genome sequences (41) . 

To date, the entire nucleotide sequences 
(only for the Edmonston B laboratory strain and the 
AIK-C vaccine strain) , coding potential, and 
organization of the MV genome have been reported (33). 
The six virion structural proteins are encoded by six 
contiguous, non- overlapping genes which are arrayed as 
follows: 3 '-N-P-M-F-H-L-5 ■ . Two additional MV gene 
products of as yet uncertain function have also been 
25 identified. These two nonstructural proteins, known as 

C (approximately 20 kD) and V (approximately 45 kD) , 
are both encoded by the P gene, the former by a second 
reading frame within the P mRNA; the latter by a 
^transcriptionally edited P gene-derived mRNA which 
encodes a hybrid protein having the amino terminal 
sequences of P and a new zinc finger- like cysteine- rich 
carboxy terminal domain (16) . 

In addition to the sequences encoding the 
virus-specified proteins, the MV genome contains 
distinctive non-protein coding domains resembling those 



20 



30 
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directing the transcriptional and replicative pathways 
of related viruses (16,42). These regulatory signals 
lie at the 3 1 and 5 1 ends of the MV genome and in short 
internal regions spanning each intercistronic boundary* 
5 The former encode the putative promoter and/or 

regulatory sequence elements directing genomic 
transcription, genome and antigenome encapsidation, and 
replication. The latter signal transcription 
termination and polyadenylation of each monocistronic 

10 viral mRNA and then reinitiation of transcription of 

the next gene. In general, the MV polymerase complex 
appears to respond to these signals much as the 
RNA-dependent RNA polymerases of other non- segmented 
negative strand RNA viruses (16,42,43,44). 

15 Transcription initiates at or near the 3» end 

of the MV genome and then proceeds in a 5 1 direction 
producing monocistronic mRNAs (40,42,45) . As the 
polymerase traverses the MV genomic template, it 
encounters putative stop/start signals which, in 3' to 

20 5 1 order, are: a semi-conserved transcription 

termination/polyadenylation signal (A/G U/C UA A/U NN 
A 4 , where N may be any of the four bases) at which each 
monocistronic RNA is completed; a non- transcribed 
intergenic trinucleotide punctuation mark (CUU; except 

25 at the H:L boundary where it is CGU) ; and a 

semiconserved start signal for transcription initiation 
of the next gene (AGG A/G NN C/A A A/G G A/U, where N 
may be any of the four bases) (45,46). Since some 
polymerase complexes fail to reinitiate, the abundance 

30 of each MV mRNA diminishes in parallel with the 

distance of the encoding gene from the genomic 3' end. 
This mRNA gradient directly corresponds to the relative 
abundance of each virus -specified protein. This 
indicates that MV protein expression is ultimately 

35 controlled at the transcriptional level (44) . 
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The 3 ' and 5 1 MV genomic termini contain 
non-protein coding sequences with distinct parallels to 
the leader and trailer RNA encoding regions of VSV 
(42) . Nucleotides 1-55 define the region between the 
genomic 3' terminus and the beginning of the N gene, 
while 37 additional nucleotides can be found between 
the end of the L gene and the 5 1 terminus of the 
genome. However, unlike VSV, or even the 
paramyxoviruses Sendai and NDV, MV does not transcribe 
these terminal regions into short, unmodified {+) or 
(-) sense leader RNAs (47,48,49). Instead, leader 
readthrough transcripts, including full-length 
polyadenylated leader :N, leader :N:P, leader :N:P:M, and 
of course full-length antigenome MV RNAs are 
transcribed (48,49). Thus, the short leader 
transcript, the key operational element determining the 
switch from transcription to replication of the VSV 
single- stranded, negative polarity genome (50,51,52), 
seems absent in MV. This leads to consideration and 
exploration of alternative models for this crucial 
reproductive event (42) . 

Measles virus, as well as all other 
Mononegavirales except the rhabdovi ruses, appears to 
have extended its terminal regulatory domains beyond 
the confines of leader and trailer encoding sequences 
(42). For measles, these regions encompass the 107 3 1 
genomic nucleotides (the "3' genomic promoter region", 
also referred to as the "extended promoter", which 
comprises 52 nucleotides encoding the leader region, 
followed by three intergenic nucleotides, and 52 
nucleotides encoding the 5' untranslated region of N 
mRNA) and the 109 5» end nucleotides (69 encoding the 
3 1 untranslated region of L mRNA, the intergenic 
trinucleotide and 37 nucleotides encoding the trailer) . 
Within these 3' terminal approximately 100 nucleotides 
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of both the genome and antigenome are two short regions 
of shared nucleotide sequence: 14 of 16 nucleotides at 
the absolute 3 1 ends of the genome and antigenome are 
identical. Internal to those termini, an additional 
5 region of 12 nucleotides of absolute sequence identity 

have been located. Their position at and near the 
sites at which the transcription of the MV genome must 
initiate and replication of the antigenome must begin, 
suggests that these short unique sequence domains 

10 encompass an extended promoter region. 

These discrete sequence elements may dictate 
alternative sites of transcription initiation the 
internal domain mandating transcription initiation at 
the N gene start site, and the 3» terminal domain 

15 directing antigenome production (42,48,53). In 

addition to their regulatory role as cis-acting 
determinants of transcription and replication, these 3 1 
extended genomic and antigenomic promoter regions 
encode the nascent 5 1 ends of antigenome and genome 

20 RNAs, respectively. Within these nascent RNAs reside 

as yet unidentified signals for N protein nucleation, 
another key regulatory element required for 
nucleocapsid template formation and consequently for 
amplification of transcription and replication. Figure 

25 2 schematically shows the location and sequence of 

these highly conserved, putative cis-acting regulatory 
domains . 

Terminal non-protein coding regions similar 
in location, size and spacing are present in the 

30 genomes of other members of the genus Paramyxoviridae, 

though only 8-11 of their absolute terminal nucleotides 
are shared by MV (42,54). The genomic terminii of the 
Morbillivirus canine distemper virus (CDV) displays a 
greater degree of homology with its MV relative: 73% 

35 of the nucleotides of the leader and trailer sequences 
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of these two viruses are identical, including 16 of 18 
at the absolute 3' termini and 17 of 18 at their 5' 
ends (55) . No accessory internal CDV genomic domain- 
sharing homology to that of the MV extended promoter 
has been found. However, there is a 20 nucleotide long 
stretch lying between CDV genomic nucleotides 85 and 
104 and 15,587 and 15,606 in which 15 of the 20 
nucleotides are complementary (Gene Bank accession 
number AF 14953) . This indicates that CDV, like MV 
contains an additional region within its non-coding 3 1 
genomic and antigenomic ends that may provide important 
cis-acting promoter and/or regulatory signals (55) . 

Additionally, the precise length of the 3 1 - 
leader region (55 nucleotides) is identical among 
several members of the Family Paramyxoviridae (MV, CDV, 
PIV-3, BPV-3, SV and NDV) . Further evidence for the 
importance of these extended, non-protein coding 
regions comes from analyses of a large number of 
distinct copy-back Defective Interfering Viruses (DIs) 
recently cloned from subacute sclerosing 

panencephalitis (SSPE) brain tissue. No DI with a stem 
shorter than the 95 5 1 terminal genomic nucleotides was 
found. This indicates that the minimal signals needed 
for MV DI RNA replication and encapsidation extend well 
beyond the 37 nucleotide long trailer sequence to 
encompass the additional internal putative regulatory 
domain (56) . 

As exemplified in part by measles virus, this 
invention is directed to the concept that important 
virulence/attenuation determinants reside in viral 
genomic non-protein coding regulatory regions and in 
the transacting transcription/replication enzyme 
complex with which these cis-acting elements must 
interact. The cis-acting domains are found both at the 
3» and 5' ends of the MV genome, flanking the six 
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contiguous genes encoding viral structural proteins; 
and within the MV genome as short regions encompassing 
internal intergenic boundaries . The former encode the 
putative promoter and/or regulatory sequence elements 
5 directing the vital processes of genomic transcription, 

genome and antigenome encapsidation, and replication. 
The latter signal transcription termination and 
polyadenylation of each monocistronic viral mRNA and 
then reinitiation of transcription of the next gene. 
10 The transcription/replication enzyme, RNA dependent RNA 

polymerase molecule can modulate transcription and/or 
replicative efficiency, thereby determining the 
abundance of cytopathic viral gene products and/or 
virion progeny. 

15 Proof of the concept of this invention for 

measles virus is obtained by first determining the 
nucleotide sequences of the non-coding regulatory 
regions (3» genomic promoter region) and the coding 
regions of the L gene (with predicted amino acid 

20 sequences) of the progenitor Edmonston wild- type MV 

isolate, together with available measles vaccine 
strains derived from this isolate (see Figure 1) . 
Independent other wild- type isolates were examined for 
comparative purposes as well . 

25 The nucleotide sequences (in positive strand, 

antigenomic, message sense) of four wild- type and five 
vaccine measles strains, as well as the deduced amino 
acid sequences of the RNA polymerase (L protein) of 
these measles viruses, are set forth as follows with 

30 reference to the appropriate SEQ ID NOS. contained 

herein: 
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Virus 
Wild-Type 
Editions ton 
1977 
1983 

Montef iore 



Nucleotide Sequence L Protein Sequence 



SEQ ID NO:l 
SEQ ID NO: 3 
SEQ ID NO: 5 
SEQ ID NO: 7 



SEQ ID NO: 2 
SEQ ID NO: 4 
SEQ ID NO: 6 
SEQ ID NO: 8 



10 



Vaccine 

Rubeovax" 

Moraten 

Zagreb 

AIK-C 



SEQ ID NO: 9 
SEQ ID NO: 11 
SEQ ID NO: 13 
SEQ ID NO: 15 



SEQ ID NO: 10 
SEQ ID NO: 12 
SEQ ID NO: 14 
SEQ ID NO: 16 



Each measles virus genome listed above is 
15 15,894 nucleotides in length. Translation of the L 

gene starts with the codon at nucleotides 9234-923 6; 
the translation stop codon is at nucleotides 15783- 
15785. The translated L protein is 2,183 amino acids 
long. 

20 Note that nucleotide 249 9 of 1983 wild- type 

measles virus is indicated as W G" in SEQ ID NO: 5. In 
fact, the base is actually a mixture of W G" and U C. 
Also note that nucleotide 2143 of Rubeovax™ vaccine 
virus is indicated as "T" in SEQ ID NO: 9. In nine 

25 clones sequenced, this base was W T" in seven and M C" in 

two; thus, this base can be W T" or W C" . 

In addition, the Schwarz vaccine virus genome 
is identical to that of the Moraten vaccine virus 
genome (SEQ ID NO:ll), except that at nucleotides 4917 

30 and 4924, Schwarz has a "C" instead of a W T" . 

Nucleotide differences distinguishing the 3» 
genomic promoter region and nucleotide and amino acid 
differences distinguishing the L gene and L protein 
sequences of the Edmonston wild- type isolate, vaccine 

35 strains and other independently isolated wild- type 
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viruses were then compared and aligned (see Tables 3-5 
in Example 1 below) . 

As shown in Table 3, there were three 
mutations from the 3* genomic promoter region (in 
antigenomic, message sense) of the progenitor wild- type 
MV isolate and the derivative vaccine strains: At 
nucleotide position 26, from "A" to "T"; at position 
42, from "A" to n C n or from "A" to "T"; and in the case 
of Zagreb only, at position 96, from "G" to "A". In 
addition, the other examined wild- type isolates 
differed from both the progenitor wild- type isolate and 
the vaccine strains at position 50 by having "A" 
instead of "G" . 

The predicted amino acid sequences of the L 
genes of measles vaccine strains (Rubeovax™, Moraten, 
Schwarz, AIK-C and Zagreb) and wild- type isolates 
(1977, 1983 and Montef iore) , differ from the progenitor 
strain (Edmonston) at 49 positions in the 2183 amino 
acid long open reading frame (see Tables 4 and 5 in 
Example 1 below) . 

These amino acid differences can be divided 
into four categories: 

(1) Positions where one vaccine strain 
differs from the progenitor, as well as from other 
vaccine and wild- type strains, suggesting a potential 
attenuation site. 

(2) Specific differences between all wild- 
type and all vaccine sequences; these may also 
constitute important attenuation sites. 

3) Residues where chronologically newer wild- 
types differ from older wild- types; which may be 
attributable to genetic drift. 

(4) Positions where one or more vaccine 
strains and/or wild- type strains have common amino 
acids and differ from all the other strains; these 
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changes may represent lineage- specif ic, potentially 
attenuating changes within the vaccine strains and 
relatedness among the wild- type isolates, respectively. 

There were four category (1) changes where 
5 one vaccine differed from the other vaccines, as well 

as the wild- type strains. Two of these were in Moraten 
and Schwarz (amino acids 331 and 2114) and two were in 
AIK-C (1624 and 2074) . These mutations are of special 
interest because all of these viruses are good 

10 vaccines. Thus, these positions are sites for 

attenuation. 

Only one position, 1717, fits into category 
(2), with all wild-types having aspartic acid and all 
vaccines having alanine. Interestingly, this position 

15 is in one of two areas where the L genes of measles and 

canine distemper virus (which are otherwise highly 
homologous) do not show exceptional conservation. This 
difference makes it more likely that 1717 is a key 
position for an attenuating mutation in measles. 

20 There were five positions, 149, 636, 720, 

2017 and 2119, where both chronologically newer wild- 
types (1983 and Montefiore) differ from older wild- 
types (Edmonston and 1977), which therefore fit into 
category (3) . These differences suggest genetic drift 

25 rather than denoting sites of attenuating mutations. 

Not included in this total are 16 positions where 
Montefiore (the 1989 isolate) differed from the rest 
(see Table 5) . These could be either genetic drift 
(category (3)) or random change (category (4)). The 

30 remaining 23 positions are category (4), with one or 

more of the viruses differing from the consensus. 

Three of these positions (1409, 1649, 1936) 
are potentially attenuating category (4) mutations. 
These are changes where two vaccine strains have a 

35 common change from the progenitor wild- type strain. 
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These changes may be connected with the vaccine lineage 
leading to the Rubeovax™ and Mora ten vaccines (Figure 
1) . 

Applicants have found that their AIK-C 
5 vaccine strain nucleotide sequence differs from the 

published sequence (33) at 21 positions, including one 
insertion and one deletion. Several of these 
differences result in coding changes including two in 
the L gene (at amino acids 1477 and 2008) . 

10 Thus, the additional changes accrued within 

the L gene sequence as the measles progenitor strain is 
progressively attenuated to achieve a replicative 
capacity optimized for live vaccine purposes appears to 
be constrained and delimited. Presumably, this limited 

15 tolerance in the number and location of L gene changes 

is imposed not only by the need to preserve the 
multifunctional capacities of the polymerase, but also 
by the preexisting 3' promoter changes with which the 
evolving L protein must interact to achieve 

20 transcription and replication. In other words, optimal 

virus attenuation requires coordinate (i.e., linked) 
changes in the polymerase protein and the cis-acting 
regulatory elements on which it acts. 

The 3' -leader displays the least tolerance 

25 for change, allowing highly selected changes during the 

attenuation process at nucleotide position 26 (always 
the change of from "A" to "T"), and at position 42 (the 
change of from "A" to M C n or from W A" to "T") (in 
antigenomic, message sense) • In the case of Zagreb 

30 only, there is a single further change, from n G n to "A" 

at position 96, which may be important when combined 
with Zagreb L gene-specific changes. The 3' -leader 
region seems to have undergone only one instance of 
genetic drift since 1954, with a change of "G" to 11 A" 

35 at position 50 (see Table 3) . 
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The net change in the 3 1 genomic promoter 
region during the attenuation process is the 
replacement of two pyrimidines by two purines in 
genomic sense in all MV vaccine strains. The co- 
5 evolution of the L gene during these attenuation 

processes is believed to reflect selection of subtle 
changes favoring reproduction of the viruses in 
different host cells. All the vaccine strains were 
grown in chick embryo (CE) or chick embryo fibroblast 

10 (CEF) cells during their attenuation process (Figure 

1) . In addition, some vaccine strains have been 
exposed to unique host cells; i.e., Zagreb vaccine was 
grown in dog kidney cells and human diploid cells, 
while the AIK-C vaccine was adapted to sheep kidney 

15 cells. Moraten and Rubeovax™ were exclusively 

developed in CE and CEF. 

Some of the lineage- specif ic L gene changes 
(position 1649 in Rubeovax™, Moraten and Schwarz 
vaccines and the change at position 1717 in all 

20 vaccines) represent a subset of adaptations of the L 

gene to the 3* -leader to modulate the 
transcription/replication processes for vaccine 
attenuation. Additionally, individual vaccine-specific 
changes (category (1)) may provide additional fine tune 

25 modulation of virus replication/transcription for each 

vaccine strain. 

Based on Table 3 and the foregoing 
discussion, the key attenuating mutations for the MV 3* 
genomic promoter region are nucleotide 2 6 (A -> T) , 

30 nucleotide 42 (A — » T or A — » C) and nucleotide 96 (G -» 

A) (in antigenomic, message sense) . 

Based on Table 4 and the foregoing 
discussion, the key attenuating sites for the L protein 
are as follows: amino acid residues 331 (isoleucine — > 

35 threonine) , 1409 (alanine -> threonine) , 1624 
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(threonine -►alanine), 1649 (arginine -> methionine), 
1717 (aspartic acid -> alanine) , 1936 (histidine 
tyrosine), 2074 (glutamine -> arginine) and 2114 
(arginine -> lysine) . It is understood that the 
nucleotide changes responsible for these amino acid 
changes are not limited to those set forth in Table 4 
of Example 1 below; all changes in nucleotides which 
result in codons which are translated into these amino 
acids are within the scope of this invention. 

Human parainfluenza virus type 3 (HPIV-3) is 
another nonsegmented, negative- sense, single stranded 
enveloped RNA virus. HPIV-3 belongs to the Family 
Paramyxoviridae (see Table 1) . The genome of HPIV-3 is 
15,462 nucleotides long and encodes six non- overlapping 
protein- encoding genes (57) . Five of the genes encode 
a single virion structural protein each, which are 
designated NP (corresponding to the N protein of MV) , 
M, F, HN (hemagglutinin-neuraminidase) and L. The 
sixth mRNA encodes the P protein, and by an overlapping 
5 1 proximal open reading frame (ORF) encodes the C 
protein, and by the RNA editing mechanism, also encodes 
the D protein. 

Like MV, HPIV-3 consists of a 3 1 -nonprotein 
coding leader region of 55 nucleotides, but unlike 
measles (where it is 37 nucleotides) , it has a 44 
nucleotide long 5* -trailer region. The polymerase 
transcribes the genome in a linear, sequential, start- 
stop manner which is guided by transcription signals in 
the RNA template. 

Attempts to develop a live attenuated HPIV-3 
vaccine by passaging the wild- type virus JS strain 
through cell culture at sub-optimal temperature has 
produced promising results (7,57). Several "cold 
passage" (cp) mutants were isolated for evaluation from 
different passage levels of the JS strain. One such 
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mutant resulted from 45 serial passages and was 
designated cp45. 

This virus exhibited three interesting 
properties: (1) cold adaptation (ca) : the ability to 
5 replicate efficiently at the suboptimal temperature of 

20°C; (2) temperature sensitivity (ts) : inability to 
replicate in vitro at temperatures greater than or 
equal to 39°C; and (3) small plaque morphology. This 
mutant appeared to be a promising vaccine candidate 

10 because: (a) its ca, ts and small plaque phenotype is 

stable after passage in cell culture; (b) its 
replication is restricted in both the upper and lower 
respiratory tract of hamsters; and (c) it induced 
significant protection in hamsters against subsequent 

15 challenge with wild- type HPIV-3 (58,59). 

Evaluation of this strain in the rhesus 
monkey showed the attenuation mutations in cp45 to be a 
combination of ts and non-ts mutations (60) . 
Subsequent evaluation in chimpanzees indicated that 

20 cp45 appeared to be satisfactorily attenuated while 

still able to induce a high level of protection against 
wild- type virus challenge (61) . Later preliminary 
clinical evaluation of cp45 in seronegative human 
infants and small children suggested that this 

25 candidate vaccine strain is suitably infectious and 

attenuated, as well as being moderately immunogenic 
(61) . 

The cp45 strain has been grown in both fetal 
rhesus lung (FRhL) and Vero cells as follows: The PIV- 

30 3 cp45 virus grown in FRhL cells was prepared by 

inoculating confluent FRhL cell monolayers in tissue 
culture flasks at an MOI 0.1-1.0. The infected cell 
cultures were fed with EMEM medium and incubated at 
32°C. About seven days later, when maximal cytopathic 

35 effects (synctyia) were observed, the virus was 
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harvested by subjecting the cultures to one freeze- thaw 
cycle, pooling the fluids and then storing the virus at 
-70 °C. 

The PIV-3 cp45 virus grown in Vero cells was 
5 prepared by inoculating with virus a bioreactor culture 

of confluent monolayers of Vero cells on microcarrier 
beads which was continuously stirred. The infected 
bioreactor culture was maintained at 30°C. The virus 
was harvested 4-5 days later when syncytial CPE was 

10 observed. The culture fluid containing the virus was 

stored at -70 °C. 

The nucleotide sequences (in positive strand, 
antigenomic, message sense) of the HPIV-3 JS wild- type 
strain (89) and the cp45 vaccine strain grown in FRhL 

15 and Vero cells, as well as the deduced amino acid 

sequences of the RNA polymerase (L protein) of these 
HPIV-3 viruses, are set forth as follows with reference 
to the appropriate SEQ ID NOS. contained herein: 

20 Virus Nucleotide Sequence L Protein Sequence 

Wild-Type 

JS SEQ ID NO: 17 SEQ ID NO: 18 

Vaccine 

25 FRhL cp45 SEQ ID NO: 19 SEQ ID NO: 20 

Vero cp45 SEQ ID NO: 21 SEQ ID NO: 22 

Each PIV-3 virus genome listed above is 
15,462 nucleotides in length. Translation of the L 
30 gene starts with the codon at nucleotides 8646-8648; 

the translation stop codon is at nucleotides 15345- 
15347. The translated L protein is 2,233 amino acids 
long. 

As detailed in Example 2 and Table 6 therein 
35 below, based upon the differences between the wild- type 
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JS strain and the FRhL-grown cp 45 mutant vaccine 
strain, the key attenuating mutations for the HPIV-3 3 1 
genomic promoter region are nucleotide 23 (T -> C) , 
nucleotide 24 (C — > T) , nucleotide 28 (G —> T) and 
5 nucleotide 45 (T -» A) (in antigenomic, message sense). 

As also detailed in Example 2 and Table 6 therein 
below, key attenuating sites for the L protein of HPIV- 
3 include the following: amino acid residues 942 
(tyrosine histidine) , 992 (leucine phenylalanine) 

10 and 1558 (threonine -» isoleucine) . 

In addition, the Vero-grown cp45 mutant 
vaccine strain contains an additional mutation 
resulting from a coding change in the L gene at amino 
acid residue 1292 (leucine -» phenylalanine) . 

15 It is understood that the nucleotide changes 

responsible for these amino acid changes are not 
limited to those set forth in Example 2 below; all 
changes in nucleotides which result in codons which are 
translated into these amino acids are within the scope 

20 of this invention. 

Human respiratory syncytial virus (RSV) is 
yet another nonsegmented, negative-sense, single 
stranded enveloped RNA virus. RSV belongs to the 
Subfamily Pneumovirinae and the genus Pneumovirus (see 

25 Table 1) . 

Two major subgroups of human RSV, designated 
A and B, have been identified based on reactivities of 
the F and G surface glycoproteins with monoclonal 
antibodies (62) . More recently, the A and B lineages 
30 of RSV strains have been confirmed by sequence analysis 

(63,64). Bovine, ovine, and caprine strains of this 
virus have also been isolated. The host specificity of 
the virus is most clearly associated with the G 
attachment protein, which is highly divergent between 
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the human and the bovine/ovine strains (65,66), and may- 
be influenced, at least in part, by receptor binding. 

RSV is the primary cause of serious viral 
pneumonia and bronchiolitis in infants and young 
5 children. Serious disease, i.e., lower respiratory 

tract disease (LRD) , is most prevalent in infants less 
than six months of age. It most commonly occurs in the 
nonimmune infant's first exposure to RSV. RSV 
additionally is associated with asthma and 

10 hyperreactive airways and it is a significant cause of 

mortality in u high risk" children with bronchopulmonary 
dysplasia and congenital heart disease (CHD) . It is 
also one of the common viral respiratory infections 
predisposing to otitis media in children. In adults, 

15 RSV generally presents as uncomplicated upper 

respiratory illness; however, in the elderly it rivals 
influenza as a predisposing factor in the development 
of serious LRD, particularly bacterial bronchitis and 
pneumonia. Disease is always confined to the 

20 respiratory tract, except in the severely 

immunocompromised, where dissemination to other organs 
can occur. Virus is spread to others by fomites 
contaminated with virus -containing respiratory 
secretions, and infection initiates through the nasal, 

25 oral, or conjunctival mucosa. 

RSV disease is seasonal and virus is usually 
isolated only in the winter months, e.g., from November 
to April in northern latitudes. The virus is 
ubiquitous, and over 90% of children have been infected 

30 at least once by 2 years of age. Multiple strains 

cocirculate. There is no direct evidence of antigenic 
drift (such as that seen with influenza A viruses), but 
sequence studies demonstrating accumulation of amino 
acid changes in the hypervariable regions of the G 
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protein and SH proteins suggest that immune pressure 
may drive virus evolution. 

In mouse and cotton rat models, both the F 
and G proteins of RSV elicit neutralizing antibodies 
5 and immunization with these proteins alone provides 

longterm protection against reinfection (67,68). 

In humans, complete immunity to RSV does not 
develop and reinfections occur throughout life (69,70); 
however, there is evidence that immune factors will 

10 protect against severe disease. A decrease in severity 

of disease is associated with two or more prior 
infections and there is evidence that children infected 
with one of the two major RSV subgroups may be somewhat 
protected against reinfection with the homologous 

15 subgroup (71) , observations which suggest that a live 

attenuated virus vaccine may provide protection 
sufficient to prevent serious morbidity and mortality. 
Infection with RSV elicits both antibody and cell 
mediated immunity. Serum neutralizing antibody to the 

20 F and G proteins has been associated, in some studies, 

with protection from LRD, although reduction in upper 
respiratory disease (URD) has not been demonstrated. 
High levels of serum antibody in infants is associated 
with protection against LRD, and adminstration of 

25 intravenous immunoglobulin with high RSV neutralizing 

antibody titers has been shown to protect against 
severe disease in high risk children (70,72,73). The 
role of local immunity, and nasal antibody in 
particular, is being investigated. 

30 The RSV virion consists of a 

ribonucleoprotein core contained within a lipoprotein 
envelope. The virions of pneumovi ruses are similar in 
size and shape to those of all other paramyxoviruses. 
When visualized by negative staining and electron 

35 microscopy, virions are irregular in shape and range in 
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diameter from 150-300 nm (74) . The nucleocapsid of 
this virus is a symmetrical helix similar to that of 
other paramyxoviruses, except that the helical diameter 
is 12-15 nm rather than 18nm. The envelope consists of 
5 a lipid bilayer that is derived from the host membrane 

and contains virally coded transmembrane surface 
glycoproteins. The viral glycoproteins mediate 
attachment and penetration and are organized separately 
into virion spikes. All members of paramyxovirus 
10 subfamily have hemagglutinating activity, but this 

function is not a defining feature for pneumovi ruses, 
being absent in RSV but present in PVM (75) . 
Neuraminidase activity is present in members of the 
genera Paramyxovirus, Rubulavirus, and is absent in 

15 Morbillivirus and Pneumovirus of mice (PVM) (75) . 

RSV possesses two subgroups, designated A and 
B. The wild- type RSV (strain 2B) genome is a single 
strand of negative -sense RNA of 15,218 nucleotides (SEQ 
ID NO: 23) that are transcribed into ten major 

20 subgenomic mRNAs. Each of the ten mRNAs encodes a 

major polypeptide chain: Three are transmembrane 
surface proteins (G, F and SH) ; three are the proteins 
associated with genomic RNA to form the viral 
nucleocapsid (N, P and L) ; two are nonstructural 

25 proteins (NS1 and NS2) which accumulate in the infected 

cells but are also present in the virion in trace 
amounts and may play a role in regulating transcription 
and replication; one is the nonglycosylated virion 
matrix protein (M) ; and the last is M2, another 

30 nonglycosylated protein recently shown to be an RSV- 

specified transcription elongation factor (see Figure 
3) . These ten viral proteins account for nearly all of 
the viral coding capacity. 

The viral genome is encapsidated with the 

35 major nucleocapsid protein (N) , and is associated with 
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the phosphoprotein (P) , and the large (L) polymerase 
protein. These three proteins have been shown to be 
necessary and sufficient for directing RNA replication 
of cDNA encoded RSV minigenomes (76) . Further studies 
5 have shown that for transcription to proceed with full 

processing, the M2 protein (ORF 1) is required (74) . 
When the M2 protein is missing, truncated transcripts 
predominate, and rescue of the full length genome does 
not occur (74) . 

10 Both the M (matrix protein) and the M2 

proteins are internal virion-associated proteins that 
are not present in the nucleocapsid structure. By 
analogy with other nonsegmented negative- stranded RNA 
viruses, the M protein is thought to render the 

15 nucleocapsid transcriptionally inactive before 

packaging and to mediate its association with the viral 
envelope. The NS1 and NS2 proteins have only been 
detected in very small amounts in purified virions, and 
at this time are considered non- structural . Their 

20 functions are uncertain, though they may be regulators 

of transcription and replication. Three transmembrane 
surface glycoproteins are present in virions: G, F, and 
SH. G and F (fusion) are envelope glycoproteins that 
are known to mediate attachment and penetration of the 

25 virus into the host cell. In addition, these 

glycoproteins represent major independent immunogens 
(77) . The function of the SH protein is unknown, 
although a recent report has implicated its involvement 
in the fusion function of the virus (78) . 

30 The genomes of two wild- type RSV subgroup B 

strains (2B and 18537) have now been sequenced in their 
entirety (see SEQ ID NOS:23 and 25, discussed below). 
Genomic RNA is neither capped nor polyadenylated (79) . 
In both the virion and intracellularly, genomic RNA is 

35 tightly associated with the N protein. 
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The 3' end of the genomic RNA consists of a 
44 -nucleotide extragenic leader region that is presumed 
to contain the major viral promoter (Fig. 3). The 3' 
genomic promoter region is followed by ten viral genes 
5 in the order 3 ■ -NS1-NS2-N-P-M-SH-G-F-M2 -L-5 ' (Fig. 3). 

The L gene is followed by a 145-149 nucleotide 
extragenic trailer region (see Figure 3) . Each gene 
begins with a conserved nine-nucleotide gene start 
signal 3 ' - GGGGC AAAU (except for the ten-nucleotide gene 
10 start signal of the L gene, which is 3 1 -GGGACAAAAU; 

differences underlined) . For each gene, transcription 
begins at the first nucleotide of the signal. Each 
gene terminates with a semi -conserved 12-14 nucleotide 
gene end (3 '-A G U/G U/A ANNN U/A A 3 . 5 ) (where N can be 

15 any of the four bases) that directs transcription 

termination and polyadenylation (Fig. 3) . The first 
nine genes are non- overlapping and are separated by 
intergenic regions that range in size from 3 to 56 
nucleotides for RSV B strains (Fig. 3) . The intergenic 

20 regions do not contain any conserved motifs or any 

obvious features of secondary structure and have been 
shown to have no influence on the preceding and 
succeeding gene expression in a minreplicon system 
(Fig. 3) . The last two RSV genes overlap by 68 

25 nucleotides (Fig. 3) . The gene- start signal of the h 

gene is located inside of, rather than after, the M2 
gene. This 68 nucleotide overlap sequence encodes the 
last 68 nucleotides of the M2 mRNA (exclusive of the 
Poly-A tail), as well as the first 68 nucleotides of 

30 the L mRNA. 

Ten different species of subgenomic 
polyadenylated mRNAs and a number of polycistronic 
polyadenylated read- through transcripts are the 
products of genomic transcription (74) . 

35 Transcriptional mapping studies using UV light mediated 
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genomic inactivation showed that RSV genes are 
transcribed in their 3' to 5 1 order from a single 
promoter near the 3» end (80). Thus, RSV synthesis 
appears to follow the single entry, sequential 
transcription model proposed for all Mononegavirales 
(16,81). According to this model, the polymerase (L) 
contacts genomic RNA in the nucleocapsid form at the 3» 
genomic promoter region and begins transcription at the 
first nucleotide. RSV mRNAs are co-linear copies of 
the genes, with no evidence of mRNA editing or 
splicing. 

Sequence analysis of intracellular RSV mRNAs 
showed that synthesis of each transcript begins at the 
first nucleotide of the gene start signal (74). The 5' 
end of the mRNAs are capped with the structure 
m7G(5 , )ppp(5 , )Gp (where the underlined 6 is the first 
template nucleotide of the mRNA) and the mRNAs are 
polyadenylated at their 3* ends (82). Both of these 
modifications are thought to be made co- 
transcriptionally by the viral polymerase. Three 
regions of the RSV 3 1 genomic promoter have been found 
to be important as cis acting elements (83) . These 
regions are the first ten nucleotides (presumably 
acting as a promoter), nucleotides 21-25, and the gene 
start signal located at nucleotides 45-53 (83) . Unlike 
other Paramyxovirinae, such as measles, Sendai and PIV- 
3, the remainder of the leader and non- coding region of 
NS1 gene of RSV was found to be highly tolerant of 
insertions, deletions and substitutions (83) . 

Additionally, by saturation mutagenesis 
(wherein each base is replaced independently by each of 
the other three bases and compared for translation and 
replication efficiencies) within the first 12 
nucleotides of the 3' genomic promoter region, a U- 
tract located at nucleotides 6-10 was shown to be 
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highly inhibitory to substitutions (83). In contrast, 
the first five nucleotides were relatively tolerant of 
a number of substitutions and two of them at position 
four were up- regulatory mutations, resulting in a four- 
5 to 20 -fold increase in RSV-CAT RNA replication and 

transcription. Using a bi-cistronic minireplicon 
system, gene-start and gene-end motifs were shown to be 
signals for mRNA synthesis and appear to be self 
contained and largely independent of the nature of 

10 adjoining sequence (84) . 

The L gene start signal lies 68 nucleotides 
upstream of the M2 gene- end signal, resulting in gene 
overlap (Fig. 3) (74). The presence of the M2 gene-end 
signal within the L gene results in a high frequency of 

15 premature termination of L gene transcripts. Full 

length L mRNA is much less abundant and is made when 
the polymerase fails to recognize the M2 gene- end 
motif. This results in much lower transcription of L 
mRNA. The gene overlap seems incompatible with a model 

20 of linear sequential transcription. It is not known 

whether the polymerase that exits the M2 gene jumps 
backward to the L gene- start signal or whether there is 
a second, internal promoter for L gene transcription 
(74) . It is also possible that the L gene is 

25 accessible by a small fraction of polymerases that fail 

to start transcription at the M2 gene- start signal and 
slide down the M2 gene to the L gene- start signal. 

The relative abundance of each RSV mRNA 
decreases with the distance of its gene from the 

30 promoter, presumably due to polymerase fall -off during 

sequential transcription (80) . Gene overlap is a 
second mechanism that reduces the synthesis of full 
length L mRNA. Also, certain mRNAs have features that 
might reduce the efficiency of translation. The 

35 initiation codon for SH mRNA is in a suboptimal Kozak 
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sequence context, while the G ORF begins at the second 
methionyl codon in the mRNA* 

RSV RNA replication is thought (74) to follow 
the model proposed from studies with vesicular 
5 stomatitis virus and Sendai virus (16,81). This 

involves a switch from the stop- start mode of mRNA 
synthesis to an antiterminator read- through mode. This 
results in synthesis of positive sense replication- 
intermediate (RI) RNA that is an exact complementary 

10 copy of genomic RNA. This serves in turn as the 

template for the synthesis of progeny genomes. The 
mechanism involved in the switch to the antiterminator 
mode is proposed to involve cotranscriptional 
encapsidation of the nascent RNA by N protein (16,81). 

15 RNA replication in RSV like other nonsegmented 

negative -strand RNA viruses is dependent on ongoing 
protein synthesis (85) . Predicted RI RNA has been 
detected for the standard virus as well as RSV-CAT 
minigenome (74,85). RI RNA was 10-20 fold less 

20 abundant intracellularly than was the progeny genome 

both for the standard and the minigenome system. The 
nucleotide sequences (in positive strand, antigenomic, 
message sense) of various wild- type, vaccine and 
revertant RSV strains, as well as the deduced amino 

25 acid sequences of the RNA polymerase (L protein) of 

these RSV viruses, are set forth as follows with 
reference to the appropriate SEQ ID NOS. contained 
herein: 
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Virus 

Wild-Type 

2B 

18537 



Nucleotide Sequence L Protein Sequence 



SEQ ID NO: 23 
SEQ ID NO: 25 



SEQ ID NO: 24 
SEQ ID NO: 26 



Vaccine 

2B33F 

2B20L 



SEQ ID NO: 27 
SEQ ID NO: 29 



SEQ ID NO: 28 
SEQ ID NO: 30 



Revertant 
2B33F TS(+) 
2B20L TS(+) 



SEQ ID NO: 31 
SEQ ID NO: 33 



SEQ ID NO: 32 
SEQ ID NO: 34 



Each RSV virus genome encodes an L protein 
that is 2,166 amino acids long. Genome length and 
other nucleotide information is as follows: 



Virus 

Wild-Type 

2B 

18537 



Genome 
Length 
15218 
15229 



L Start Codon 
8502-8504 
8509-8511 



I* Stop Codon 

15000-15002 

15007-15009 



Vaccine 

2B33F 

2B20L 



15219 
15219 



8503-8505 
8503-8505 



15001-15003 
15001-15003 



Revertant 
2B33F TS(+) 
2B20L TS(+) 



15219 
15219 



8503-8505 
8503-8505 



15001-15003 
15001-15003 



As detailed in Example 3 (especially Tables 7 
and 8) below / the key attenuating mutations for the RSV 
subgroup B 3 1 genomic promoter region are nucleotide 4 
(C ->G), and the insertion of an additional A in the 
stretch of A»s at nucleotides 6-11 (in antigenomic 



SUBSTITUTE SHEET (RULE 26) 



WO 98/13501 PCT/US97/16718 



- 47 - 



message sense) . As also detailed in Example 3 below, 
the key potentially attenuating sites for the L protein 
of RSV are as follows: amino acid residues 353 
(arginine -> lysine), 451 (lysine -> arginine) , 1229 
5 (aspartic acid -» asparagine) , 2 02 9 (threonine -> 

isoleucine) and 2050 (asparagine -> aspartic acid) . It 
is understood that the nucleotide changes responsible 
for these amino acid changes are not limited to those 
set forth in Example 3 below; all changes in 
10 nucleotides which result in codons which are translated 

into these amino acids are within the scope of this 
invention. 

The attenuated viruses of this invention 
exhibit a substantial reduction of virulence compared 

15 to wild- type viruses which infect human and animal 

hosts. The extent of attenuation is such that symptoms 
of infection will not arise in most immunized 
individuals, but the virus will retain sufficient 
replication competence to be infectious in and elicit 

20 the desired immune response profile in the vaccinee. 

The attenuated viruses of this invention may 
be used to formulate a vaccine. To do so, the 
attenuated virus is adjusted to an appropriate 
concentration and formulated with any suitable vaccine 

25 adjuvant, diluent or carrier. Physiologically 

acceptable media may be used as carriers. These 
include, but are not limited to: an appropriate 
isotonic medium, phosphate buffered saline and the 
like. Suitable adjuvants include, but are not limited 

30 to MPL™ (3-O-deacylated monophosphoryl lipid A; RIBI 

ImmunoChem Research, Inc., Hamilton, MT) and IL-12 
(Genetics Institute, Cambridge, MA) . 

In one embodiment of this invention, the 
formulation including the attenuated virus is intended 

35 for use as a vaccine. The attenuated virus may be mixed 
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with cryoprotective additives or stabilizers such as 
proteins (e.g., albumin, gelatin), sugars (e.g., 
sucrose, lactose, sorbitol), amino acids (e.g., sodium 
glutamate) , saline, or other protective agents. This 
5 mixture is maintained in a liquid state, or is then 

dessicated or lyophilized for transport and storage and 
mixed with water immediately prior to administration. 

Formulations comprising the attenuated 
viruses of this invention are useful to immunize a 

10 human or animal subject to induce protection against 

infection by the wild- type counterpart of the 
attenuated virus. Thus, this invention further 
provides a method of immunizing a subject to induce 
protection against infection by an RNA virus of the 

15 Order Mononegavirales by administering to the subject 

an effective immunizing amount of a vaccine formulation 
incorporating an attenuated version of that virus as 
described hereinabove. 

A sufficient amount of the vaccine in an 

20 appropriate number of doses must be administered to the 

subject to elicit an immune response. Persons skilled 
in the art will readily be able to determine such 
amounts and dosages. Administration may be by any 
conventional effective form, such as intranasally, 

25 parenterally, orally, or topically applied to any 

mucosal surface such as intranasal, oral, eye, vaginal 
or rectal surface, such as by an aerosol spray. The 
preferred means of administration is by intranasal 
administration. 

30 In another embodiment of this invention, an 

isolated nucleic acid molecule having the complete 
viral nucleotide sequence of either the wild- type 
viruses or vaccine viruses described herein is used to 
generate oligonucleotide probes (from either positive 

35 strand antigenomic message sense or negative strand 
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complementary genomic sense) and to express peptides 
(from positive strand antigenomic message sense only) , 
which are used to detect the presence of those wild- 
type virus and/or vaccine strains in samples of body 
5 fluids and tissues. The nucleotide sequences are used 

to design highly specific and sensitive diagnostic 
tests to detect the presence of the virus in a sample. 

Polymerase chain reaction (PCR) primers are 
synthesized with sequences based on the viral wild- type 

10 or vaccine sequences described herein. The test sample 

is subjected to reverse transcription of RNA, followed 
by PCR amplification of selected cDNA regions 
corresponding to the nucleotide sequence described 
herein which have nucleotides which are distinct for a 

15 defined strain of virus. Amplified PCR products are 

identified on gels and their specificity confirmed by 
hybridization with specific nucleotide probes. 

ELISA tests are used to detect the presence 
of antigens of the wild-type or vaccine viral strains. 

20 Peptides are designed and selected to contain one or 

more distinct residues based on the wild- type or 
vaccine sequences described herein. These peptides are 
then coupled to a hapten (e.g., keyhole limpet 
hemocyanin (KLH) and used to immunize animals (e.g., 

25 rabbits) for the production of monospecific polyclonal 

antibody. A selection of these polyclonal antibodies, 
or a combination of polyclonal and monoclonal 
antibodies can then be used in a "capture ELISA" to 
detect antigens produced by those viruses. 

30 Samples of the Moraten measles virus vaccine 

strain were deposited by Applicants with the American 
Type Culture Collection, 12301 Parklawn Drive, 
Rockville, Maryland 20852, U.S.A., under the provisions 
of the Budapest Treaty for the Deposit of 

35 Microorganisms for the Purposes of Patent Procedures 
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("Budapest Treaty") and have been assigned ATCC 
accession number VR2587. Samples of the HPIV-3 virus 
Vero-grown cp45 vaccine strain were deposited by 
Applicants with the American Type Culture Collection, 
5 12301 Parklawn Drive, Rockville, Maryland 20852, 

U.S.A., under the provisions of the Budapest Treaty and 
have been assigned ATCC accession number VR2588. 
Samples of the 2B wild- type RSV virus were deposited by 
Applicants with the American Type Culture Collection, 

10 12301 Parklawn Drive, Rockville, Maryland 20852, 

U.S.A., under the provisions of the Budapest Treaty and 
have been assigned ATCC accession number VR2586. 

Given these three deposited strains and the 
sequence information for these and other strains 

15 provided herein, one can use site-directed mutagenesis 

and rescue techniques described above to introduce 
mutations (or restore a wild- type genotype) of all the 
strains described herein, as well as taking these 
strains and making additional mutations from the panel 

20 of mutations set forth in Tables 3, 4 and 6-8 below. 

In order that this invention may be better 
understood, the following examples are set forth. The 
examples are for the purpose of illustration only and 
are not to be construed as limiting the scope of the 

25 invention. 



Examples 



Standard molecular biology techniques are 
30 utilized according to the protocols described in 

Sambrook et al. (86). 
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Example 1 
Measles 

Moraten MV vaccine virus was grown once, 
directly from the Attenuvax™ vaccine vial (Lot #0716B) , 
the Schwarz vaccine virus was grown once (Lot 
96G04/M179 G41D) , while the Zagreb and Rubeovax™ 
vaccine viruses were each grown twice in the Vero cells 
before RNAs were made for sequence analysis. MV 
wildtype isolate Montefiore (56) was passed 5-6 times 
in Vero cells before extraction of RNA materials and 
similarly, MV wildtype isolates 1977, 1983 (14) were 
grown 5-7 times before extracting materials for 
analysis. Edmonston wild- type isolate received from 
Dr. J. Beeler (CBER) (see Fig. 1) was the original 
Edmonston isolate already passaged seven times in human 
kidney cells and three times in Vero cells before 
receipt and further passaged once in Vero cells before 
using for sequence analysis. 

RNA was prepared by infecting Vero cells at a 
multiplicity of infection (m.o.i.) of 0.1 to 1.0 and 
allowed to reach maximum cytopathology before being 
harvested. Total RNA from measles virus-infected cells 
was extracted using Trizol™ reagent (Gibco-BRL) . 

The total RNA isolated from Vero cell passage 
material was amplified by the Reverse Transcriptase-PCR 
(Perkin-Elmer/Cetus) procedure using measles (Edmonston 
B strain (19)) specific primer pairs spanning the 3 f 
and 5 1 promoter regions and the L gene of the viral 
genome. Table 2 presents these primer sequences. The 
primers of SEQ ID NOS: 35-54 , 74, 77 and 78 are in 
antigenomic message sense. The primers of SEQ ID 
NOS:55-73, 75, 76 and 79 are in genomic negative- sense. 
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Table 2 

Primers for PCR and Sequencing MV L Genes 
and Genomic Termini 



9047 CATATCACTCACTCTGGGATGGAG 9070 


(SEO 


ID 


NO : 3 5 ) 


9371 TCAGAACATCAAGCACCGCC 9390 


(SEO 


ID 


NO : 3 6 ) 


9741 ACAGTCAAGACTGAGATGAG 9760 


(SEO 


ID 


NO • 3 7 > 


, AAGAGTCAGATACATGTGGA^ 




Tn 


INw .JO/ 


t ft , ei ACATGAATCAGCCTAAAGTC, , 




1U 




CCGAAAGAGTTCCTGCGTTACGACC 




TD 
11/ 


IN W • 1 U / 


11083 CAGTCCACACAAGTACCAGG 11102 


(SEQ 


ID 


N0:41) 


H46iGTCAGAAGCTGTGGACCATC 11480 


(SEQ 


ID 


NO:42) 


11841 AATATTGCTACAACAATGGC 11860 


(SEQ 


ID 


NO:43) 


12196 ACTCTTCATTCCTAGACTGG 12215 


(SEQ 


ID 


NO:44) 


12542 GTCCAATTATGACTATGAAC 12561 


(SEQ 


ID 


NO:45) 


12891 AGAACAGACATGAAGCTTGC 12910 


(SEQ 


ID 


NO:46) 


13232 CCAACAAGGAATGCTTCTAG 13251 


(SEQ 


ID 


NO:47) 


13551 ACAGCACTATCTATGATTGACCTGG 13575 


(SEQ 


ID 


NO:48) 


13930 GCAACATGGTTTACACATGC 13949 


(SEQ 


ID 


NO:49) 


14280 AGATTGAGAGTTGATCCAGG 14299 


(SEQ 


ID 


NO:50) 


1462 9 AGGAGATACTTAAACTAAGC 14 64 8 


(SEQ 


ID 


N0:51) 


14981 TAAGCTTATGCCTTTCAGCG 15000 


(SEQ 


ID 


NO:52) 


15337 TTAACGGACCTAAGCTGTGC 15356 


(SEQ 


ID 


NO:53) 


15671 GAAACAGATTATTATGACGG 15690 


(SEQ 


ID 


NO:54) 



9290 CGGGCTATCTAGGTGAACTTCAGG 9267 


(SEQ 


ID 


NO: 


55) 


9500 ATTTGGATATGGAATATGAG 9481 


(SEQ 


ID 


NO: 


56) 


9840 ACTCAACTGAACTACCAGTG 9821 


(SEQ 


ID 


NO: 


57) 


10181 AAGAACATCATGTATTTCAG 10162 


(SEQ 


ID 


NO: 


58) 


10549 TTATCAACGCACTGCTCATG 10530 


(SEQ 


ID 


NO: 


59) 


10919 ATTTTCAGCAATCACTTGGCATGCC 10895 


(SEQ 


ID 


NO: 


60) 


11280 GCCTCTGTGCAAACAAGCTG 11261 


(SEQ 


ID 


NO: 


61) 


ii6 3 eTCTCTAGTTACTCTAGCAGC 11619 


(SEQ 


ID 


NO: 


62) 


120X0 AGGTCGTTGTTTGTGAGGAG 11991 


(SEQ 


ID 


NO: 


63) 


123 6i T CGTCCTCTTCTTTACTGTC 12342 


(SEQ 


ID 


NO: 


64) 



SUBSTITUTE SHEET (RULE 26) 



WO 98/13501 PCT/US97/16718 



53 



10 



i2689 CCGT CCTCGAGCTAGCCTCG 12670 (SEQ ID NO: 65) 

13052 CTCCTCCAGGCTCACATTGG 13033 (SEQ ID NO: 66) 

13420 GGGTTGGTACATAGCTCTGC 13401 (SEQ ID NO: 67) 

13767 CACCCATCTGATATTTCCCTGATGG 13743 (SEQ ID NO: 68) 

14099 TGGTTGACAGTACAAATCTG 14080 (SEQ ID NO: 69) 

14460 CTGAAATGGGAAGATTGTGC 14441 (SEQ ID NO: 70) 

14820 AGCAATCTACACTGCCTACC 14801 (SEQ ID NO: 71) 

15180 TCACAGATGATTCAATTATC 15161 (SEQ ID NO: 72) 

15S30 GATCCTAGATATAAGTTCTC 15511 (SEQ ID NO: 73) 



]ACCAAACAAAGTTGGGTAAGG 2 x (SEQ ID NO: 74) 

GGGGGATCC 100 ATCCCTAATCCTGCTCTTGTCCC 78 (SEQ ID NO: 75) 

200 GATTCCTCTGATGGCTCCAC 181 (SEQ ID NO: 76) 

15721 TAACAGTCAAGGAGACCAAAG 15741 (SEQ ID NO: 77) 

15 GGGAAGCTT 15801 AACCCTAATCCTGCCCTAGGTGG 1S823 (SEQ ID NO: 78) 

15B94 ACCAGACAAAGCTGGGAATAGA 15873 (SEQ ID NO: 79) 

Overlapping PCR fragments of the complete 
viral genome were directly sequenced without cloning to 

20 achieve the consensus sequence, by the dideoxy 

terminator cycle sequencing method using both strands 
(ABI PRISM 377 sequencer and ABI PRISM sequencing Kit) . 
To determine the sequence at the absolute termini, a 
ligation procedure described previously was used (55) . 

25 To test this hypothesis, the nucleotide 

sequences were determined for the non-protein coding 
regulatory regions and the L gene of the progenitor 
Edmonston wild- type MV isolate, for the available 
vaccine strains derived from this isolate, as well as 

30 for other wild- type strains. Nucleotide (in 

antigenomic, message sense) and amino acid differences 
were then compared and aligned as set forth in Tables 
3-5 (differences are in italics) : 
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Table 3 

Differences in MV 3 9 Genomic Promoter Region 
Nucleotide Sequence 



Virus 

Edmonston w-t 



Nucleotide number: 
26 42 50 96 
A A G G 



Vaccines: 

Rubeovax™ 

Moraten 

Schwarz 

Zagreb 

AIK-C 



T 
T 
T 
T 
T 



C 
C 
C 
T 
C 



G 
G 
G 
G 
G 



G 
G 
G 
A 
G 



Wild-Types: 

1977 

1983 

Montef iore 



A 
A 
A 



A 
A 
A 



A 
A 
A 



G 
G 
G 
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Table 4 

Differences in MV L Nucleotides and Amino Acids 
Between Edmons ton Wild -Type and Vaccine Strains 

331 1409 1624 1649 1717 1887 1936 2074 2114 

Edmons ton w-t ATT GCA ACC AGG GAT AAC CAT CAA AGA 
Mutation ACT ACA GCC ATG GCT GAC TAT CGA AAA 



Edmons ton w-t 


I 


A 


T 


R 


D 


N 


H 


Q 


R 


Rubeovax™ vac . 


I 


A 


T 


M 


A 


D 


H 


Q 


R 


Moraten vac. 


T 


A 


T 


M 


A 


D 


H 


Q 


K 


Schwarz vac. 


T 


A 


T 


M 


A 


D 


H 


Q 


K 


Zagreb vac. 


I 


T 


T 


R 


A 


N 


H 


Q 


R 


AIK-C vac. 


I 


T 


A 


R 


A 


N 


Y 


R 


R 
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Example 2 
PIV-3 

A comparison of sequences (in antigenomic 
5 message sense) of the parental wild- type JS strain of 

PIV-3 virus and the FRhL-grown and Vero-grown forms of 
the cp45 mutant are set forth in Table 6. Where a 
codon change does not result in an amino acid change, 
Table 6 states "none", followed by the name of the 
10 unchanged amino acid. 
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Sequence analysis of the parental wild- type 
JS strain of PIV-3 virus and the FRhL-grown cp45 mutant 
showed that the latter contained 20 nucleotide changes. 
Pour changes were in the noncoding 3 1 -leader region at 
5 nucleotide positions 23 (T C) , 24 (C T) , 28 (6 

T) and 45 (T — > A) (in antigenomic, message sense) . 
When considered in the genomic, negative sense, the 
change at position 28 from the smaller pyrimidine ("C") 
to the larger purine ("A") may change the size of the 

10 region flanked by the conserved regions of the 3' 

genomic promoter region, resulting in an altered 
spatial presentation of the cis-acting signals to the 
polymerase* 

Nine changes were coding changes in the NP, 

15 M, F, HN and L genes. The other seven changes were 

non- coding or silent changes in the NP, P, F, HN and L 
genes or the NP untranslated region (UTR) . The cp45 
mutant has been demonstrated to have poor transcription 
activity at non-permissive temperatures due to its ts 

20 phenotype (87) . This ts phenotype has now been mapped 

to the viral L gene (88) . Because the cp45 virus has 
been shown to function normally with regard to 
mutations in the HN and F glycoproteins (87), this 
supports the implication that mutations in the 3»- 

25 leader and L gene contributed to the attenuating 

phenotype of this virus. 

Thus, the four 3 1 leader specific changes in 
FRhL-grown cp45 and the three coding changes in the L 
gene at amino acid positions 942 (Tyr ->His), 992 (Leu 

30 ->Phe) and 1558 (Thr -» He) contributed significantly 

to the attenuation phenotype of the candidate cp45 
vaccine strain. 

Furthermore, the Vero-grown cp45 mutant 
vaccine strain contains an additional mutation 

35 resulting from a coding change in the L gene (marked 
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with an asterisk in Table 6) at amino acid residue 1292 
(leucine -> phenylalanine) . 

The first two amino acid changes in the L 
protein (at positions 942 and 992) map to one of the 
5 highly conserved areas among all Paramyxovirus L genes. 

The fourth amino acid change (at position 1558) maps to 
the area joining two conserved blocks corresponding to 
the change at amino acid 1717 in the MV vaccine 
strains. 

10 The published literature (89) sets forth only 

18 changes between the antigenomic message sense 
sequences of the JS and FRhL-grown cp45 strains. 
Sixteen of these changes were found by applicants. 

The published literature did not report four 

15 changes found by applicants: in the 3' leader at 

nucleotide 45 (T -> A) , in the NP UTR at nucleotide 62 
(A — > T) , or the changes in amino acids in the NP 
protein resulting from the changes at nucleotide 397 (T 
-> C) , leading to the amino acid change (Val ->Ala) and 

20 nucleotide 1275 (T -> G) , leading to the amino acid 

change (Ser ->Ala) (nucleotide changes in antigenomic, 
message sense) . Nor did the published literature 
report the additional potentially attenuating mutation 
in the L protein found by applicants in the Vero -grown 

25 cp45 strain resulting from the change at nucleotide 

12521 (A — » T) , leading to the change in amino acid 
1292 (Leu -> Phe) . 
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Example 3 
RSV Subgroup B 

The temperature-sensitive (ts) phenotype is 
strongly associated with attenuation in vivo; in 
addition, some non-ts mutations may also be 
attenuating. Identification of ts and non-ts 
attenuating mutations was achieved by sequence analysis 
and evaluation of ts, cold-adapted (ca) , and in vivo 
growth phenotypes of RSV mutants and revertants. 

The genomes of the following five RSV 2B 
strains have now been completely sequenced: 2B parent, 
2B33F, one revertant designated 2B33F TS(+), 2B20L and 
one revertant designated 2B20L TS(+). The 2B33F and 
15 2B20L strains are ts and ca and are described in U.S. 

Serial No. 08/059,444 (90), which is hereby 
incorporated by reference. After identifying regions 
where mutations in 2B33F and 2B20L are located, nine 
additional isolates of 2B33F "revertants" obtained 
20 following in vitro passaging at 3 9°C and in vivo 

passaging in African Green Monkeys or chimpanzees, and 
nine additional isolates of 2B20L "revertants" obtained 
following in vitro passaging at 39°C have been 
sequenced in those regions. The ts, ca, and 
25 attenuation phenotypes of many of these revertants have 

now been characterized and assessed. Correlations 
between phenotype ts, vaccine attenuation and sequence 
changes have been identified. 

A summary of results is presented in Tables 

30 7-12. 
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Table 7 

Sequence comparison between RSV 2B and 2B33F strains 





Nucl. 


Nucleotide 


changes 






pos. t 








\ 


Gene/ 


J ■ end 


RSV 2B 


RSV 


RSV 2B33F 


Amino acid 


region 


Of vRNA 




2B33F 


TS(+), 5a 


changes 










revertant 




Genomic 


4 


C 


G 


G 


non- coding 


Promoter 


6 


" 


extra A 


extra A 


non- coding 


M 


4175 


T 


C 


C 


non- coding 






T 


C 


c 


non- coding 1 


SH 


4329 


T 


C 


c 


Phe-Leu (10) 




4409 


T 


c 


c 


none lie (36) 




4420 


T 


C 


c 


Ile-Thr (40) 




4442 


T 


C 


c 


none His (47) 




4454 


T 


c 


c 


none Cys (51) 




4484 


T 


c 


c 


none Tyr (61) | 




4497 


T 


c 


c 


Stop-Gin (66) 




4505 


T 


c 


c 


none Ser (68) 




4525 


T 


c 


c 


Ile-Thr (75) 




4526 


T 


c 


c 


Ile-Thr (75) 




4542 


T ! 


c 


c 


Stop-Gin (81) 




4561 


T 


c 


c 


Leu-Pro (87) 




4575 


T 


c 


c 


Trp-Arg (92) 




4598 


T 


c 


c 


none Thr (99) 


L 


9559 


G 


A 


A 


Arg-Lys (353) 




9853* 


A 


G 


A 


Lys-Arg (451)* 




12186 


G 


A 


A 


Asp-Asn (1229) 




14587 


C 


T 


T 


Thr-Ile (2029) 




15071 


A 


G 


G 


non- coding 



t For 2B33F and 2B33F TS(+), nucl. pos. numbers 

are one larger than for 2B for M, SH & L genes 

* At pos. 9853, the Lys-Arg change has reverted 

back to Lys in the 2B33F TS( + ) strain 
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Table 8 

Sequence comparison between RSV 2B and 2B20L strains 





Nucl. 


Nucleotide changes 






pos. t 










Gene/ 


3' end 


RSV 2B 


RSV 


RSV 2B20L 


Amino acid 


region 


of vRNA 




2B20L 


TS(+), Rl 


changes 










revertant 




Genomic 


4 


C 


6 


G 


non- coding* 


Promoter 


6 




extra A 


extra A 


non- coding* 


L 


8963 


C 


T 


T 


none Thr (154) 




13347 


A 


A 


G 


Asn-Asp (1616) 




14587 


C 


T 


T 


Thr-Ile(2029)* 




14649 


A 


6 


G 


Asn-Asp (2050) 




14650 


A 


A 


T 


Asn-Asp-Val 












(2050)** 


t 


For 2B20L and 


2B20L TS(+) , nucl. 


pos. numbers 



are one larger than for 2B for L gene 
Mutation is common in 2B33F and 2B20L strains 
At pos. 14650, the mutation suppresses the ts 
phenotype in 2B20L TS(+) revertant 
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Table 10 
2B33F Revertants 



j - 


tB ( + ) 


In 


vitro 


AGM 


Chimp 




5a 


4a 


3b 


pp2 


pp4 


pp6 


pp7 


1A 


3A 


5A 


base no . t 








M 






















4176.4200 


S 


S 


S 


S 


S 


s 


S 


s 


S 


S 


SH 






















14 bases* 


S 


S 


S 


S 


S 


s 


S 


s 


S 


S 


L 






















9560 


S 


S 


s 


S 


S 


s 


s 


s 


S 


s 


9854 


2B 


2B 


2B 


2B 


S 


s 


s 


ND 


2B 


2B 


12187 


S 


S 


S 


S 


S 


s 


s 


S 


S 


S 


14588 


s 


S 


s 


S 


S 


s 


s 


ND 


S 


S 


15072 


s 


S 


s 


s 


S 


s 


s 


S 


S 


s 


P he no type 








ts 


2B 


2B 


2B 


r 


r 


s 


s 


2B 


2B 


2B 


ca 


S 


S 


S 


2B 


S 


2B 


s 


ND 


ND 


ND 


Attenuated 


r 


r 


r 


(r) 


(r) 


S 


s 


ND 


r 


r 1 



t These 2B33F revertant base nos. are one larger than for 2B for M, 
SH and L genes 

* bases 4330,4410,4421,4443,4455,4485,4498,4506,4526,4527,4543, 

4562,4576,4599 
S s same base as 2B33F 

2B =» reversion to 2B base or complete reversion in phenotype 
r = moderate reversion in phenotype 
(r) = slight reversion in phenotype 
ND = not done 
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Table 11 
2B20L Revertants 


































TS( + ) 


In vitro 


Isolates 






base no.t 


Rl 


R2 


R3A 


R4A 


R5A 


R6A R7A 


R8A 


R9A 


R10A 


L 

8964 


S 


S 


S 


S 


S 


S 


S 


S 


S 


S 


13348 


C* 


S 


ND 


S 


S 


ND 


S 


S 


S 


S 


14588 


S 


s 


S 


S 


S 


S 


S 


s 


S 


S 


14650 


S 


s 


2B 


S 


2B 


2B 


s 


s 


2B 


2B 


14651 


A* 


A* 


S 


A* 


S 


S 


A* 


A* 


S 


S 


Phenotype 




ta 


2B 


2B 


ND 


ND 


ND 


ND 


ND 


ND 


2B 


2B 


Attenuated 


r 


r 


ND 


ND 


ND 


ND 


ND 


ND 


r 


r J 



t These 2B20L revertant base nos. are one larger than for 2B for L 
genes 

S = same base as 2B20L 

2B a reversion to 2B base 

r s moderate reversion in phenotype 

* = base change, different from 2B or 2B2 0L 

ND = not done 
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Table 12 

RSV 2B, ts and Revertant Strains: Phenotype Summary 



V1£U0 loOldLt) 


Source 


In Vitro 
Phenotype 
ts ca 


in Vivo 
Attenuation 
Cotton AGM 
Rat 


|| RSV 2B 


Wild-type Parent Strain 










RSV 2B33F 


ca, ts mutant isolated 
from 2B, cold-passaged 
x 33 


++++ 




++++ 


+++ | 


RSV 2B33P - 5a 
TS( + ) 


2B33F spinner passage 
plaque picked at 39°C 




♦+ 


++ 


+ 


RSV 2B33F - 4a 
TS{ + ) 


2B33F spinner passage 
plaque picked at 39°C 




♦+ 


++ 


ND 


RSV 2B33F - 3b 
TS( + ) 


2B33F spinner passage 
plaque picked at 39°C 


- 


++ 


! ++ 


ND 


AGM pp2 




2B33F- infected AGM A2 , 
d7 nasal wash plaque 
picked at 32°C 


+ 




+++ 


ND 


AGM pp4 


2B33F-infected AGM A2, 
d7 nasal wash plaque 
picked at 32°C 


+ 


++ 


+++ 


ND 


AGM pp6 


2B33F- infected AGM A4 , 
dl2 nasal wash plaque 
picked at 32°C 


++++ 




++++ 


ND 1 


AGM pp7 


2B33F-infected AGM A4, 
dl2 nasal wash plaque 
picked at 32°C 


++++ 


++ 


++++ 


ND 


Chimp pplA 


2B33F-inf acted chimp 
#1552, d4 tracheal 
lavage, plaque picked 
at 32°C 




ND 


ND 


ND 


Chimp pp3A 


2B33F-inf ected chimp 
#1560, d6 tracheal 
lavage, plaque picked 
at 32°C 




ND 


++ 


ND 


Chimp pp5A 


2B33F-inf ected chimp 
#1563, dlO tracheal 
lavage, plaque picked 
at 32°C 




ND 




ND 
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Table 12 (continued) 
RSV 2B, ts and Revertant Strains: Phenotype Summary 



Virus Isolate 


Source 


In Vitro 
Phenotype 
ts ca 


In Vivo 
Attenuation 
Cotton AGM 
Rat 


RSV 2B20L 


ca, ts mutant isolated 
from 2B, cold-passaged 
x 20 


++++ 


++ 


++++ 


+++ + 


RSV 2B20L Rl 
TS( + ) 


2B20L spinner passage 
plaque picked at 39°C 




ND 


++ 


ND 


RSV 2B20L R2 
TS( + ) 


2B20L spinner passage 
plaque picked at 39°C 




ND 


++ 


ND 


RSV 2B20L R9 
TS( + ) 


2B20I* spinner passage 
plaque picked at 39°C 




ND 


++ 


ND 


RSV 2B20L RIO 
TS( + ) 


2B20L spinner passage 
plaque picked at 39°C 




ND 


♦+ 


ND 



ND = not done 

- = wild-type phenotype, i.e., not temperature sensitive, not cold 

adapted, not attenuated 
+ to ++++ s increasing levels of temperature sensitivity, cold- 
adaptation or attenuation 
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Several significant observations can be drawn 
from these data: 



a. As shown in Tables 7 (for 2B33F) and 8 (for 
5 2B2 0L) , there are relatively few sequence changes 

identified in the two mutant strains: RSV 2B33P 
differs from parental RSV 2B by two changes at the 3' 
genomic promoter region, two changes at the non-coding 
5»-end of the M gene, and four coding changes plus one 

10 non-coding (poly (A) motif) change in the RNA dependent 

RNA polymerase coding L gene. In addition, 14 changes 
mapped to the SH gene alone. RSV 2B20L differs from 
its RSV 2B parent only at seven nucleotide positions, 
of which three are common with 2B33F virus, including 

15 two changes at the 3 1 genomic promoter and one coding 

change in the L gene. Two additional unique changes of 
2B20L virus mapped to the coding region of the L gene. 
Potentially attenuating mutations at the non-coding 3' 
genomic promoter region and the RNA dependent RNA 

20 polymerase gene have been identified. 

b. Two ts mutations can be identified in the L 
gene of the attenuated virus strains 2B33F and 2B20L: 

25 (i) In 2B33F, a mutation at nucleotide position 

9853 (A -> G) leading to a coding change in L protein 
at amino acid 451 (Lys -»Arg) is clearly associated 
with the ts and attenuation phenotypes. Reversion at 
this site alone in the 2B33F TS( + ) 5a strain is 

30 responsible for complete restoration of growth at 39°C 

(Table 9) and partial reversion in attenuation in 
animals. This association with the ts and attenuation 
phenotypes was also supported by partial sequence 
analyses of six additional "full TS revertants" 

35 (designated 4a, 3b, pp2, 3A, 5a, 5A) isolated from cell 
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culture and from chimps, in which only the nucleotide 
9853 mutation reverted (Tables 10-12) (note that one 
AGM (African Green Monkey) isolate which reverted at 
9853 only partially reverted in ts phenotype) . This 
5 amino acid 451 mutation (Lys — >Arg) is amenable to 

stabilization in cDNA infectious clone constructs, by 
inserting a second mutation to stabilize the codon, 
thereby lessening the likelihood that it will revert 
back to Lys. 

(ii) In 2B20L, a mutation at base 14,649 (A — » G) 

leading to a coding change in the L protein (amino acid 
position 2,050, Asn -»Asp) appears to be associated 
with the ts and attenuation phenotypes. This aspartic 

15 acid at the amino acid 2050 invariably reverts back 

(Asp ->Asn) in TS( + ) revertants or changes to a 
different amino acid (Asp -> Val) by nucleotide 
substitution at position 14,650 (A — ► T) (Tables 8, 
11) . The above observation is based on complete 

20 sequence analysis on the TS(+) revertant Rl and partial 

sequence of several additional TS(+) revertants (R2, 
R4A, R7A, R8A) at selected regions (Table 11) . An 
additional mutation is seen in the Rl revertant at 
nucleotide postion 13,347 (amino acid 1616, Asn -» 

25 Asp) associated with the above reversion. However, the 

effect of this mutation on the ts phenotype is not 
known; the L gene of other revertants has not been 
sequenced completely. 

30 c. Three base changes are common to 2B33F and 

2B20L strains of virus: 

(i) A change at position 14,587 (C — > T) with a 

corresponding change (Thr -» lie) at amino acid 2029 is 
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present in both 2B33F and 2B20L (Tables 7,8). This 
nucleotide "T" substitution was found to be present in 
10% of the population of the progenitor RSV2B strain 
and may have been preferred during the attenuation 
5 process. No wildtype base "C" was found in the 2B33F 

and 2B20L virus. 

(ii) Two mutations are seen in the 2B33F and 2B20L 

3' genomic promoter region: nucleotide 4 (C —> G) and 

10 the insertion of an extra A in the stretch of A's at 

positions 6-11 (in antigenomic, message sense) . When 
the sequences of selected TS(+) revertants were 
analyzed, these mutations were seen to have been 
retained in the 2B33F TS(+)5a (Table 7) and the 2B20L 

15 TS(+)R1 (Table 8) revertants. These non-coding, cis- 

acting mutations remained associated with partial viral 
attenuation. 

Expression using the minireplicon RSV-CAT 
system for the analysis of these cis -acting changes has 

20 shown the 3 1 genomic promoter nucleotide 4 (C -> G) 

change to be an upregulation of 

transcription/replication in this in vitro system when 
the 2B progenitor virus or either of the 2B33F or 2B33F 
TS(+) provided helper L gene functions (the N, P and M2 

25 genes are identical in these viruses) . 

Complementation analysis of the 2B33F 3 1 
genomic promoter and the helper functions provided by 
the progenitor RSV2B virus or the 2B33F and 2B33F TS(+) 
viruses by this RSV-CAT minireplicon system has also 

30 been conducted. All three viruses supported both the 

2B and 2B33F 3» genomic promoter mediated 
transcription/replication functions. However, the 
2B33F and 2B33F TS(+) viruses preferred their 2B33F 3' 
genomic promoters. This analysis clearly shows co- 

35 evolution of 3 ' genomic promoter changes during the 
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vaccine attenuation process, along with the RNA 
dependent RNA polymerase gene. Reversion of ts 
phenotype in the 2B33F mutant 5a by reversion of the 
single L protein amino acid 451 (Arg -^Lys) by 
5 sequence analysis was clearly demonstrated by support 

of transcription/replication functions of RSV-CAT 
minireplicon at 37°C. The 2B33P virus did not provide 
helper functions to the RSV-CAT minireplicon (with 2B 
or 2B33F 3' genomic promoters) at 37°C. 

10 

d. A biased hypermutation of SH seen in 2B33F is 

present in all 2B33F revertants, regardless of 
phenotype, and is not seen in 2B20L, which is ts, ca, 
and attenuated. Thus, there are no data at this time 
15 that associate this mutation with any biological 

phenotype . 

Another wild- type RSV designated 18537 was 
also sequenced and compared to the sequence of the 
wild-type RSV 2B strain. With one exception, at all 

20 the critical residues described above, the two wild- 

type strains were identical. For 2B, the codon ACA at 
nucleotides 14586-14588 encodes a Thr at amino acid 
2029 of the L protein, while for 18537, the codon ATT 
at nucleotides 14593-14595 encodes an lie at amino acid 

25 2029 (the L gene start codon is at nucleotides 8509- 

8511 in 18537, compared to 8502-8504 in 2B) . 

Example 4 
PCR Assay to Detect Measles Virus 

30 

A 21 year old patient was admitted to a 
hospital with a three week history of progressive non- 
productive cough, shortness of breath, and fever. His 
symptoms failed to improve following treatment with 
35 clarithromycin for seven days or after a similar course 
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of treatment with atovaquone. Concomitant complaints 
of right upper quadrant abdominal pain proved 
recalciltrant to omeprazole and antacids. Relevant 
past medical history included Factor VIII deficiency 
5 and HIV infection diagnosed 3-4 years prior to this 

hospital admission. One year earlier, he had received 
a booster immunization of measles-mumps-rubella (MMR) 
vaccine as required for college enrollment* 

Bronchoalveolar lavage and transbronchial 

10 biopsies performed two days after admission to the 

hospital demonstrated reactive hyperplasia and alveolar 
lining cell desquamation with minimal chronic 
inflammation. No microorganisms were revealed by Gram, 
methenamine silver, or PAS stains. CT scans of the 

15 chest showed multiple, ill-defined, confluent nodules 

at the left lung base. Despite administration of 
empiric antimicrobials for opportunistic bacterial, 
mycobacterial, and fungal pathogens commonly 
responsible for pulmonary complications of advanced HIV 

20 disease, the patient became and remained febrile to 

39°C. A left- sided pleural effusion developed; 
diagnostic thoracentesis showed it to be exudative but 
otherwise non-diagnostic. Bronchoalveolar lavage 
performed three weeks later only demonstrated alveolar 

25 histiocytes, some of which were hemosiderin laden, a 

few lymphocytes, and neutrophils. FITE, AFB # and 
methanamine silver stains again were negative. 

Two weeks thereafter, a wedge resection of 
the left lung was performed through CT-guided 

30 mini thoracotomy . Multiple tissue sections revealed 

nodular areas of acute and chronic inflammation with 
regions of necrosis and fibrosis. Numerous 
multinuclated giant cells were present, some of which 
contained both intracytoplasmic and intranuclear 

35 inclusions suggestive of measles virus giant cell 
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pneumonia. Special stains for bacteria, fungi, P. 
carinii, and acid fast organisms again gave negative 
results. Electron microscopic examination of sections 
of this lung biopsy revealed particles morphologically 
5 consistent with paramyxoviruses such as measles virus. 

Serum anti-measles IgM titers determined by a solid 
phase hemadsorbant assay were negative, as was a 
subsequent IgM capture immunoassay* 

Two weeks later, Rhesus monkey kidney (RMK) 

10 tissue culture cells inoculated with the patient's lung 

biopsy material revealed cytopathic changes 
characteristic of measles virus infection. 
Confirmation was obtained using an immunofluorescence 
assay with monoclonal antibodies directed to measles 

15 virus. Based upon this diagnosis, oral ribavirin 

lOOOmg B.I.D. was given for 14 days. Unfortunately, 
the patient progressively deteriorated, eventually 
dying two months later. 

In order to ascertain the nature of the 

20 measles virus present in the patient, reverse 

transcription and PGR amplification of virus obtained 
from infected tissues were performed, followed by 
sequence analysis. The measles virus isolated from 
Rhesus monkey kidney cells inoculated with tissue from 

25 this patient's lung biopsy was propagated by two serial 

passages in the continuous Vero (monkey kidney) tissue 
culture cell line. Total infected cell RNA was 
extracted at the second Vero cell passage using TRIzol 
reagent (Life Technologies, Grand Island, NY) according 

30 to the manufacturer's protocol. Total RNA was 

similarly extracted from the patient's lung biopsy 
material. The measles virus vaccine strain (Moraten) 
currently used in the United States as a component of 
the trivalent MMR vaccines, was obtained in its 

35 univalent form (Attenuvax™, Merck, Sharpe, & Dohme) . 
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This virus was passaged once in Vero cells and total 
vaccine infected cellular RNA then was extracted as 
described above. 

Each of these RNA preparations was reverse 
5 transcribed (RT) to cDNA using random hexameric primers 

and Maloney murine leukemia virus reverse transcriptase 
(Perkin-Elmer/Cetus RT-PCR kit reagents, Perkin-Elmer- 
Cetus, Branchburg, NJ) . The cDNA then was amplified by 
PCR using measles virus- specif ic oligodeoxynucleotide 

10 primer pairs whose design was based on the Edmonston 

measles virus sequence described above. These PCR 
products comprised a set of overlapping DNA fragments 
spanning the entire 15,894 nucleotide long measles 
genome. A consensus genomic sequence was established 

15 by direct analysis of each PCR product, without 

cloning, using the dideoxy terminator cycle- sequencing 
method established by the manufacturer (ABI PRISM 377 
sequencer and ABI PRISM DNA sequencing kit; Perkin- 
Elmer/Cetus, Foster City, CA) . Both strands of the 

20 PCR-amplif ied DNA products were analyzed to eliminate 

'. possible sequencing ambiguities. 

The nucleotide sequences of selected regions 
of the measles virus genomes present in the patient's 
viral isolate, as well as in the diseased lung tissue, 

25 were compared with that of the Moraten vaccine virus, 

as well as with the nucleotide sequences of other 
measles virus wild- type and vaccine strains. This 
sequence analysis revealed identity to the Moraten 
vaccine strain rather than demonstrating r elatedness to 

30 past or currently circulating wild- type viruses or 

other measles vaccine strains. 
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Example 5 
ELISA to Detect RSV 

An ELISA test is used to detect the presence 
5 of RSV. Peptides are designed and selected based on 

homologies to the RSV sequences described herein to be 
specific for all subgroup B strains, or for individual 
wild- type, vaccine or revertant RSV subgroup B strains 
described herein. These peptides are then coupled to 
10 KLH and used to immunize rabbits for the production of 

monospecific polyclonal antibody. A selection of these 
polyclonal antibodies, or a combination of polyclonal 
and monoclonal antibodies is then used in a "capture 
ELISA" to detect the presence of an RSV antigen. 



SUBSTITUTE SHEET (RULE 26) 




WO 98/13501 PCT/US97/16718 



- 80 - 



Bibliography 

1. Kapikian, A.Z., et al., Am. J, 
Epidemol . . 89/ 405-421 (1969) . 

2. Chin, J., et al., Am. J. Epidemol, . 89 , 
449-463 (1969) . 

3. Fulginiti, V.A., et al.. Am. J. 
Epidemol . , 89/ 435-448 (1969). 

4. Prince, G.A., et al., J. Virology . 57, 
721-728 (1986) . 

5. Kim, H.W., et al., Pediatrics . 52 , 56-63 

(1973) . 

6. Hodes, D.S., et al., Proc . Soc . Exp . 
Biol. Med. . 145 , 1158-1164 (1974). 

7. Belshe, R.B., and Hissom, F.K., J. Med. 
Virol . , 10, 235-242 (1982). 

8. Black, F.L., et al., Am. J. Epidemiol. . 
124, 442-452 (1986) . 

9. Lennon, J.L., and Black, F.L., J. 
Pediatrics . 108 , 671-676 (1986). 

10. Pabst, H.F., et al . , Pediatr. Infect. 
Pis. J. . 11, 525-529 (1992). 

11. Centers for Disease Control, MMWR, 40, 
369-372 (1991). 

12. Centers for Disease Control, MMWR , 
41;S6 . 1-12 (1992) . 

13. King, G.E., et al., Pediatr. Infect. 
Pis. J. . 10, 883-887 (1991). 

14. Rota, J.S., et al., Virology , 188 , 135- 
142 (1992). 

15. Rota, J.S., et al., Virus Res. . 31 , 317- 
330 (1994) . 

16. Lamb, R.A., and Kolakosky, D. , pages 
1177-1204 of Volume 1, Fields Virology , B.N. Fields, et 
al., Eds. (3rd ed., Raven Press, 1996). 



SUBSTITUTE SHEET (RULE 26) 



WO 98/13501 PCT/US97/16718 



81 - 



17. Sidhu, M.S., et al . , Virology , 193 , 50- 

65 (1993). 

18. Garcin, D., et al., EMBO J. , 14, 6087- 
6094 (1995). 

19. Radecke, F., et al., EMBO J. , 14, 5773- 
5783 (1995). 

20. Collins, P.L., et al . , Proc. Natl. Acad. 
Sci., USA , 92, 11563-11567 (1995). 

21. Published European Patent Application 
No. 702,085. 

22. Published International Application No. 
WO 96/10400. 

23. Baron, M.D., and Barrett, T., J. 
Virology , 71, 1265-1271 (1997) . 

24. Published International Application No. 
WO 97/06270. 

25. U.S. Provisional Patent Application No. 

60/047575. 

26. Published International Application No. 
WO 97/12032. 

27. Kato, A., et al., Genes to Cells , 1, 
569-579 (1996). 

28. Sidhu, M.S., et al., Virology , 208 , 800- 
807 (1995) . 

29. Shaffer, M.F., et al . , J. Immunol. , 41, 
241-256 (1941). 

30. Enders, J.F., et al . , N. Engl. J. Med. , 
263 , 153-159 (1960) . 

31. Enders, J.F., and Peebles, M.E., Proc. 
Soc. Exp. Biol. Med. , 86 , 227-286 (1954) . 

32. Schwarz, A.J.F., Am. J. Pis. Child. , 
103 , 216-219 (1962). 

33. Griffin, D.E., and Bellini, W.J., pages 
1267-1312 of Volume 1, Fields Virology , B.N. Fields, et 
al., Eds. (3rd ed., Raven Press, 1996). 



SUBSTITUTE SHEET (RULE 26) 



WO 98/13501 PCT/US97/16718 



82 - 



34. Birrer, M.J., et al., Virology , 108 , 
381-390 (1981) . 

35. Birrer, M.J., et al., Nature , 293 , 67-69 

(1981) . 

36. Norby, E., et al., pages 481-507, in The 
Paramyxoviruses , D. Kingsbury, Ed. (Plenum Press, 
1991) . 

37. Peebles, M.E., pages 427-456, in The 
Paramyxoviruses , D. Kingsbury, Ed. (Plenum Press, 
1991) . 

38. Egelman, E.H., et al., J. Virol. , 63, 
2233-2243 (1989) . 

39. Udem, S.A., et al . , J. Virol. Methods , 
8, 123-136 (1984). 

40. Udem, S.A., and Cook, K.A., J. Virol. , 
49, 57-65 (1984). 

41. Moyer, S.A., and Horikami, S.M., pages 
249-274, in The Paramyxoviruses , D. Kingsbury, Ed. 
(Plenum Press, 1991) . 

42. Blumberg, B., et al., pages 235-247, in 
The Paramyxoviruses , D. Kingsbury, Ed. (Plenum Press, 
1991) . 

43. Berrett, T., et al . , pages 83-102, in 
The Paramyxoviruses , D. Kingsbury, Ed. (Plenum Press, 
1991) . 

44. Tordo, N. , et al., Sem. in Virology , 3, 
341-357 (1992). 

45. Cattaneo, R., et al., EMBO J . , 6, 681- 
688 (1987). 

46. Crowley, J.C., et al., Virology , 164 , 
498-506 (1988). 

47. Banerjee, A.K., and Barik, S., et al.. 
Virology , 188 , 417-428 (1992). 

48. Castaneda, S.J., and Wong, T.C., J. 
Virol . . 63, 2977-2986 (1989) . 



SUBSTITUTE SHEET (RULE 26) 



WO 98/13501 PCT/US97/16718 

- 83 - 



49. Chan, J., et al., pages 221-231, in 
Genetics and Pathogenicity of Negative Stranded 
Viruses . B.W.J. Mahy and D. Kolakofsky, Eds. (Elsevier 
Biomedical Press, 1989) . 

50. Blumberg, B., et al.. Cell , 23, 837-845 

(1981) . 

51. Blumberg, B., et al., Cell , 32, 559-567 

(1983) . 

52. Kolakofsky, D., and Blumberg, B.M., 
pages 203-213, in Virus Persistence . B.M.J. Mahy, et 
al., Eds. (Cambridge University Press, 1982). 

53. Castaneda, S.J., and Wong, T.C., J. 
Virol . , 64, 222-230 (1990). 

54. Curran, J. A., and Kolakofsky, D., 
Virology , 182 , 168-176 (1991). 

55. Sidhu, M.S., et al., Virology , 193 , 66- 

72 (1993). 

56. Sidhu, M.S., et al.. Virology , 202 , 631- 
641 (1994) . 

57. Collins, P.L., et al., pages 1205-1241 
of Volume 1, Fields Virology . B.N. Fields, et al . , Eds. 
(3rd ed.. Raven Press, 1996). 

58. Crookshanks, F.K., and Belshe, R.B., J. 
Med. Virol. . 13, 243-249 (1984). 

59. Crookshanks -Newman, F.K., and Belshe, 
R.B., J. Med. Virol. , 18, 131-137 (1986). 

60. Hall, S.L., et al . , Virus Res. , 22, 173- 
184 (1992) . 

61. Karron, R.A., et al., J. Inf. Pis. , 172 , 
1445-1450 (1995). 

62. Anderson, L.J., et al., J. Infect. Pis. . 
151, 626-633 (1985). 

63. Collins, P.L., pages 103-162 of The 
Paramyxoviruses , P.W. Kingsbury, Ed. (Plenum Press, NY 
and London, 1991) . 



SUBSTITUTE SHEET (RULE 26) 




WO 98/13501 



PCT/US97/16718 



- 84 - 



64. Sullender, W.M., J. Virology , 65 , 5425- 
5434 (1991). 

65. Lerch, R.A., et al., J. Virology , 64, 
5559-5569 (1990). 

66. Mallipeddi, S.K., and Samal # S.K., J. 
Gen Virol . . 74, 2787-2791 (1993). 

67. Johnson, P.R., et al. # J. Virology , 61 , 
3163-3166 (1987). 

68. Stott, E.J., et al., J. Virology , 61, 
3855-3861 (1987). 

69. Henderson, F.W., et al., N. Engl. J. 
Med. , 300 , 530-534 (1979) . 

70. Hall, S.L., et al., J. Infect. Pis. , 
163 , 693-698 (1991). 

71. Mufson, M.A., et al . , J. Gen. Virol. , 
66, 2111-2124 (1985) . 

72. Glezen, W.P., et al . , Am. J. Pis. 
Child. , 140 , 543-546 (1986). 

73. Hemming, V.G., et al . , Clin. Microbiol. 
Res. , 8, 22-33 (1995) . 

74. Collins, P. L. et. al., pages 1313-1351 
of volume 1, Fields Virology , B. N. Fields, et al., 
Eds. (3rd ed., Raven Press, 1996). 

75. Ling, R. , and Pringle, C.R., J. Gen. 
Virol . , 70, 1427-1440 (1989). 

76. Yu, Q. , et al . , J. Virology , 69, 2412- 
2419 (1995) . 

77. Mcintosh, K. , and Chanock, R.M., pages 
1045-1072 of Virology , B.N. Fields, et al . , Eds. (2nd 
ed., Raven Press, 1990). 

78. Heminway, B.R., et al., page 167 of 
Abstracts of the IX International Congress of Virology, 
P17-2, (1993) . 

79. Mink, M.A., et al., Virology , 185 , 615- 
624 (1991) . 



SUBSTITUTE SHEET (RULE 26) 



WO 98/13501 PCT/US97/16718 



85 - 



80. Dickens, L.E., et al . , J. Virology. . 52, 
364-369 (1990). 

81. Wagner, R.R., and Rose, J.K., pages 
1121-1135 of volume 1, Fields Virology , B.N. Fields, 
et al., Eds. (3rd ed., Raven Press, 1996). 

82. Barik, S., J. Gen. Virol. . 74 , 485-490 

(1993) . 

83. Collins, P.L., et al., pages 259-264 of 
Vaccines 93: modern approaches to new vaccines 
including prevention of AIDS , F. Brown et al., Eds. 
(Cold Spring Harbor Laboratory Press, NY, 1993) . 

84. Kuo, L . , et al . , J. Virology. . 70 , 6892- 
6901 (1996). 

85. Huang, Y.T., and Wertz, G.W., J. 
Viroloocrv . 43, 150-157 (1982). 

86. Sambrook, J., et al., Molecular Cloning; 
A Laboratory Manual , 2nd ed. , Cold Spring Harbor 

Laboratory Press, Cold Spring Harbor, N.Y. (1989). 

87. Ray, R. , et al., J. Virol. , 69, 1959- 
1963 (1995) . 

88. Ray, R., et al., J. Virol. , 70 , 580-584 

(1996) . 

89. Stokes, A., et al., Virus Research , 30 , 
43-52 (1993). 

90. U.S. Patent Application No. 08/059,444. 



SUBSTITUTE SHEET (RULE 26) 



WO 98/13501 



PCT/US97/16718 



- 86 - 



SEQUENCE LISTING 



(1) GENERAL INFORMATION: 

(i) APPLICANT: Udem, Stephen A. 

Sidhu, Mohinderjit S. 
Tatem, Joanne M. 
Murphy, Brian R. 
Randolph, Valerie B. 

(ii) TITLE OF INVENTION: 3» Genomic Promoter Region and 

Polymerase Gene Mutations Responsible for Attenuation in 
Viruses of the Order Designated Mononegavirales 

(iii) NUMBER OF SEQUENCES: 79 

<iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: American Home Products Corporation 

(B) STREET: One Campus Drive 

(C) CITY: Parsippany 

(D) STATE : New Jersey 

(E) COUNTRY: United States 

(F) ZIP: 07054 

(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS/MS-DOS 

(D) SOFTWARE: Patentln Release #1.0, Version #1.30 

(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: US 

(B) FILING DATE: 

(C) CLASSIFICATION: 

(viii) ATTORNEY/ AGENT INFORMATION: 

(A) NAME: Gordon, Alan M. 

(B) REGISTRATION NUMBER: 30,637 

(C) REFERENCE/DOCKET NUMBER: 33,294 PCT 

(ix) TELECOMMUNICATION INFORMATION: 
<A) TELEPHONE: 973/683-2157 
(B) TELEFAX: 973/683-4117 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15894 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
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(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: RNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:l: 
ACCAAACAAA 6TT666TAA6 GATAGATCAA TCAATGATCA TATTCTAGTG CACTTAGGAT 
TCAAGATCCT ATTATCAGGG ACAAGAGCAG GATTAGGGAT ATCCGAGATG GCCACACTTT 
TAAGGAGCTT AGCATTGTTC AAAAGAAACA AGGACAAACC ACCCATTACA TCAGGATCCG 
GTGGAGCCAT CAGAGGAATC AAACACATTA TTATAGTACC AATCCCTGGA GATTCCTCAA 
TTACCACTCG ATCCAGACTT CTGGACCGGT TGGTCAGGTT AATTGGAAAC CCGGATGTGA 
GCGGGCCCAA ACTAACAGGG GCACTAATAG GTATATTATC CTTATTTGTG GAGTCTCCAG 
GTCAATTGAT TCAGAGGATC ACCGATGACC CTGACGTTAG CATAAGGCTG TTAGAGGTTG 
TCCAGAGTGA CCAGTCACAA TCTGGCCTTA CCTTCGCATC AAGAGGTACC AACATGGAGG 
ATGAGGCGGA CCAATACTTT TCACATGATG ATCCAATTAG TAGTGATCAA TCCAGGTTCG 
GATGGTTCGA GAACAAGGAA ATCTCAGATA TTGAAGTGCA AGACCCTGAG GGATTCAACA 
TGATTCTGGG TACCATCCTA GCCCAAATTT GGGTCTTGCT CGCAAAGGCG GTTACGGCCC 
CAGACACGGC AGCTGATTCG GAGCTAAGAA GGTGGATAAA GTACACCCAA CAAAGAAGGG 
TAGTTGGTGA ATTTAGATTG GAGAGAAAAT GGTTGGATGT GGTGAGGAAC AGGATTGCCG 
AGGACCTCTC CTTACGCCGA TTCATGGTCG CTCTAATCCT GGATATCAAG AGAACACCCG 
GAAACAAACC CAGGATTGCT GAAATGATAT GTGACATTGA TACATATATC GTAGAGGCAG 
GATTAGCCAG TTTTATCCTG ACTATTAAGT TTGGGATAGA AACTATGTAT CCTGCTCTTG 
GACTGCATGA ATTTGCTGGT GAGTTATCCA CACTTGAGTC CTTGATGAAC CTTTACCAGC 
AAATGGGGGA AACTGCACCC TACATGGTAA TCCTGGAGAA CTCAATTCAG AACAAGTTCA 
GTGCAGGATC ATACCCTCTG CTCTGGAGCT ATGCCATGGG AGTAGGAGTG GAACTTGAAA 
ACTCCATGGG AGGTTTGAAC TTTGGCCGAT CTTACTTTGA TCCAGCATAT TTTAGATTAG 
GGCAAGAGAT GGTAAGGAGG TCAGCTGGAA AGGTCAGTTC CACATTGGCA TCTGAACTCG 
GTATCACTGC CGAGGATGCA AGGCTTGTTT CAGAGATTGC AATGCATACT ACTGAGGACA 
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AGATCAGTAG AGCGGTTGGA CCCAGACAAG CCCAAGTATC ATTTCTACAC GGTGATCAAA 
GTGAGAATGA GCTACCGAGA TTGGGGGGCA AGGAAGATAG GAGGGTCAAA CAGAGTCGAG 
GAGAAGCCAG GGAGAGCTAC AGAGAAACCG GGCCCAGCAG AGCAAGTGAT GCGAGAGCTG 
CCCATCTTCC AACCGGCACA CCCCTAGACA TTGACACTGC ATCGGAGTCC AGC CAAGATC 
CGCAGGACAG TCGAAGGTCA GCTGACGCCC TGCTTAGGCT GCAAGCCATG GCAGGAATCT 
CGGAAGAACA AGGCTCAGAC ACGGACACCC CTATAGTGTA CAATGACAGA AATCTTCTAG 
ACTAGGTGCG AGAGGCCGAG GACCAGAACA ACATCCGCCT ACCCTCCATC ATTGTTATAA 
AAAACTTAGG AACCAGGTCC ACACAGCCGC CAGCCCATCA ACCATCCACT CCCACGATTG 
GAGCCGATGG CAGAAGAGCA GGCACGCCAT GTCAAAAACG GACTGGAATG CATCCGGGCT 
CTCAAGGCCG AGCCCATCGG CTCACTGGCC ATCGAGGAAG CTATGGCAGC ATGGTCA6AA 
ATATCAGACA ACCCAGGACA GGAGCGAGCC ACCTGCAGGG AAGAGAAGGC AGGCAGTTCG 
GGTCTCAGCA AACCATGCCT CTCAGCAATT GGATCAACTG AAGGCGGTGC ACCTCGCATC 
CGCGGCCAGG GACCTGGAGA GAGCGATGAC GACGCTGAAA CTTTGGGAAT CCCCCCAAGA 
AATCTCCAGG CATCAAGCAC TGGGTTACAG TGTTATTATG TTTATGATCA CAGCGGTGAA 
GCGGTTAAGG GAATCCAAGA TGCTGACTCT ATCATGGTTC AATCAGGCCT TGATGGTGAT 
AGCACCCTCT CAGGAGGAGA CAATGAATCT GAAAACAGCG ATGTGGATAT TGGCGAACCT 
GATACCGAGG GATATGCTAT CACTGACCGG GGATCTGCTC CCATCTCTAT GGGGTTCAGG 
GCTTCTGATG TTGAAACTGC AGAAGGAGGG GAGATCCACG AGCTCCTGAG ACTCCAATCC 
AGAGGCAACA ACTTTCCGAA GCTTGGGAAA ACTCTCAATG TTCCTCCGCC CCCGGACCCC 
GGTAGGGCCA GCACTTCCGA GACACCCATT AAAAAGGGCA CAGACGCGAG ATTAGCCTCA 
TTTGGAACGG AGATCGCGTC TTTATTGACA GGTGGTGCAA CCCAATGTGC TCGAAAGTCA 
CCCTCGGAAC CATCAGGGCC AGGTGCACCT GCGGGGAATG TCCCCGAGTG TGTGAGCAAT 
GCCGCACTGA TACAGGAGTG GACACCCGAA TCTGGTACCA CAATCTCCCC GAGATCCCAG 
AATAATGAAG AAGGGGGAGA CTATTATGAT GATGAGCTGT TCTCTGATGT CCAAGATATT 
AAAACAGCCT TGGCCAAAAT ACACGAGGAT AATCAGAAGA TAATCTCCAA GCTAGAATCA 
CTGCTGTTAT TGAAGGGAGA AGTTGAGTCA ATTAAGAAGC AGATCAACAG GCAAAATATC 
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AGCATATCCA CCCTGGAAGG ACACCTCTCA AGCATCATGA TCGCCATTCC TGGACTTGGG 
AAGGATCCCA ACGACCCCAC TGCAGATGTC GAAATCAATC CCGACTTGAA ACCCATCATA 
GGCAGAGATT CAGGCCGAGC ACTGGCCGAA GTTCTCAAGA AACCCGTTGC CAGCCGACAA 
CTCCAAGGAA TGACAAATGG ACGGACCAGT TCCAGAGGAC AGCTGCTGAA GGAATTTCAG 
CTAAAGCCGA TCGGGAAAAA GATGAGCTCA GCCGTCGGGT TTGTTCCTGA CACCGGCCCT 
GCATCACGCA GTGTAATCCG CTCCATTATA AAATCCAGCC GGCTAGAGGA GGATCGGAAG 
CGTTACCTGA TGACTCTCCT TGATGATATC AAAGGAGCCA ATGATCTTGC CAAGTTCCAC 
CAGATGCTGA TGAAGATAAT AATGAAGTAG CTACAGCTCA ACTTACCTGC CAACCCCATG 
CCAGTCGACC CAACTAGTAC AACCTAAATC CATTATAAAA AACTTAGGAG CAAAGTGATT 
GCCTCCCAAG TTCCACAATG ACAGAGATCT ACGACTTCGA CAAGTCGGCA TGGGACATCA 
AAGGGTCGAT CGCTCCGATA CAACCCACCA CCTACAGTGA TGGCAGGCTG GTGCCCCAGG 
TCAGAGTCAT AGATCCTGGT CTAGGCGACA GGAAGGATGA ATGCTTTATG TACATGTTTC 
TGCTGGGGGT TGTTGAGGGC AGCGATCCCC TAGGGCCTCC AATCGGGCGA GCATTTGGGT 
CCCTGCCCTT AGGTGTTGGC AGATCCACAG CAAAGCCCGA AGAACTCCTC AAAGAGGCCA 
CTGAGCTTGA CATAGTTGTT AGACGTACAG CAGGGCTCAA TGAAAAACTG GTGTTCTACA 
ACAACACCCC ACTAACTCTC CTCACACCTT GGAGAAAGGT CCTAACAACA GGGAGTGTCT 
TCAACGCAAA CCAAGTGTGC AATGCGGTTA ATCTGATACC GCTCGATACC CCGCAGAGGT 
TCCGTGTTGT TTATATGAGC ATCACCCGTC TTTCGGATAA CGGGTATTAC ACCGTTCCTA 
GAAGAATGCT GGAATTCAGA TCGGTCAATG CAGTGGCCTT CAACCTGCTG GTGACCCTTA 
GGATTGACAA GGCGATAGGC CCTGGGAAGA TCATCGACAA TACAGAGCAA CTTCCTGAGG 
CAACATTTAT GGTCCACATC GGGAACTTCA GGAGAAAGAA GAGTGAAGTC TACTCTGCCG 
ATTATTGCAA AATGAAAATC GAAAAGATGG GCCTGGTTTT TGCACTTGGT GGGATAGGGG 
GCACCAGTCT TCACATTAGA AGCACAGGCA AGATGAGCAA GACTCTCCAT GCACAACTCG 
GGTTCAAGAA GACCTTATGT TACCCGCTGA TGGATATCAA TGAAGACCTT AATCGATTAC 
TCTGGAGGAG CAGATGCAAG ATAGTAAGAA TCCAGGCAGT TTTGCAGCCA TCAGTTCCTC 
AAGAATT CCG CATTTACGAC GACGTGATCA TAAATGATGA CCAAGGACTA TTCAAAGTTC 
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TGTAGACCGT AGTGCCCAGC AATGCCCGAA AACGACCCCC CTCACAATGA CAG CCAGAAG 
GCCCGGACAA AAAAGCCCCC TCCGAAAGAC TCCACGGACC AAGCGAGAGG CCAGCCAGCA 
GCCGACGGCA AGCGCGAACA CCAGGCGGCC CCAGCACAGA ACAGCCCTGA CACAAGGCCA 
CCACCAGCCA CCCCAATCTG CATCCTCCTC GTGGGACCCC CGAGGACCAA CCCCCAAGGC 
TGCCCCCGAT CCAAACCACC AACCGCATCC CCACCACCCC CGGGAAAGAA ACCCCCAGCA 
ATTGGAAGGC CCCTCCCCCT CTTCCTCAAC ACAAGAACTC CACAACCGAA CCGCACAAGC 
GACCGAGGTG ACCCAACCGC AGGCATCCGA CTCCCTAGAC AGATCCTCTC TCCCCGGCAA 
ACTAAACAAA ACTTAGGGCC AAGGAACATA CACACCCAAC AGAACCCAGA CCCCGGCCCA 
CGGCGCCGCG CCCCCAACCC CCGACAACCA GAGGGAGCCC CCAACCAATC CCGCCGGTTC 
CCCCGGTGCC CACAGGCAGG GACACCAACC CCCGAACAGA CCCAGCACCC AACCATCGAC 
AATCCAAGAC GGGGGGGCCC CCCCAAAAAA AGGCCCCCAG GGGCCGACAG CCAGCACCGC 
GAGGAAGCCC ACCCACCCCA CACACGACCA CGGCAACCAA ACCAGAACCC AGACCACCCT 
GGGCCACCAG CTCCCAGACT CGGCCATCAC CCCGCAGAAA GGAAAGGCCA CAACCCGCGC 
ACCCCAGCCC CGATCCGGCG GGGAGCCACC CAACCCGAAC CAGCACCCAA GAGCGATCCC 
CGAAGGACCC CCGAACCGCA AAGGACATCA GTATCCCACA GCCTCTCCAA GTCCCCCGGT 
CTCCTCCTTT TCTCGAAGGG ACCAAAAGAT CAATCCACCA CACCCGACGA CACTCAACTC 
CCCACCCCTA AAGGAGACAC CGGGAATCCC AGAATCAAGA CTCATCCAAT GT CCATCATG 
GGTCTCAAGG TGAACGTCTC TGCCATATTC ATGGCAGTAC TGTTAACTCT CCAGACACCC 
ACCGGTCAAA TCCATTGGGG CAATCTCTCT AAGATAGGGG TGGTAGGAAT AGGAAGTGCA 
AGCTACAAAG TTATGACTCG TTCCAGCCAT CAATCATTAG T CAT AAAAT T AATGCCCAAT 
ATAACTCTCC TCAATAACTG CACGAGGGTA GAGATTGCAG AATACAGGAG ACTACTGAGA 
ACAGTTTTGG AACCAATTAG AGATGCACTT AATGCAATGA CCCAGAATAT AAGACCGGTT 
CAGAGTGTAG CTTCAAGTAG GAGACACAAG AGATTTGCGG GAGTAGTCCT GGCAGGTGCG 
GCCCTAGGCG TTGCCACAGC TGCTCAGATA ACAGCCGGCA TTGCACTTCA CCAGTCCATG 
CTGAACTCTC AAGCCATCGA CAATCTGAGA GCGAGCCTGG AAACTACTAA TCAGGCAATT 
GAGGCAATCA GACAAGCAGG GCAGGAGATG ATATTGGCTG TTCAGGGTGT CCAAGACTAC 
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ATCAATAATG AGCTGATACC GTCTATGAAC CAACTATCTT GTGATTTAAT CGGCCAGAAG 
CTCGGGCTCA AATTGCTCAG ATACTATACA GAAATCCTGT CATTATTTGG CCCCAGCTTA 
CGGGACCCCA TATCTGCGGA GATATCTATC CAGGCTTTGA GCTATGCGCT TGGAGGAGAC 
ATCAATAAGG TGTTAGAAAA GCTCGGATAC AGTGGAGGTG ATTTACTGGG CATCTTAGAG 
AGCAGAGGAA TAAAGGCCCG GATAACTCAC GTCGACACAG AGTCCTACTT CATTGTCCTC 
AGTATAGCCT ATCCGACGCT GTCCGAGATT AAGGGGGTGA TTGTCCACCG GCTAGAGGGG 
GTCTCGTACA ACATAGGCTC TCAAGAGTGG TATAC CACTG TGCCCAAGTA TGTTGCAACC 
CAAGGGTACC TTATCTCGAA TTTTGATGAG TCATCGTGTA CTTTCATGCC AGAGGGGACT 
GTGTGCAGCC AAAATGCCTT GTACCCGATG AGTCCTCTGC TCCAAGAATG CCTCCGGGGG 
TCCACCAAGT CCTGTGCTCG TACACTCGTA TCCGGGTCTT TTGGGAACCG GTTCATTTTA 
TCACAAGGGA ACCTAATAGC CAATTGTGCA TCAATCCTTT GCAAGTGTTA CACAACAGGA 
ACGATCATTA ATCAAGACCC TGACAAGATC CTAACATACA TTGCTGCCGA TCACTGCCCG 
GTAGTCGAGG TGAACGGCGT GACCATCCAA GTCGGGAGCA GGAGGTATCC AGACGCTGTG 
TACTTGCACA GAATTGACCT CGGTCCTCCC ATATCATTGG AGAGGTTGGA CGTAGGGACA 
AATCTGGGGA ATGCAATTGC TAAGTTGGAG GATGCCAAGG AATTGTTGGA GTCATCGGAC 
CAGATATTGA GGAGTATGAA AGGTTTATCG AGCACTAGCA TAGTCTACAT CCTGATTGCA 
GTGTGTCTTG GAGGGTTGAT AGGGATCCCC GCTTTAATAT GTTGCTGCAG GGGGCGTTGT 
AACAAAAAGG GAGAACAAGT TGGTATGTCA AGACCAGGCC TAAAGCCTGA TCTTACGGGA 
ACATCAAAAT CCTATGTAAG GTCGCTCTGA TCCTCTACAA CTCTTGAAAC ACAAATGTCC 
CACAAGTCTC CTCTTCGTCA TCAAGCAACC ACCGCACCCA GCATCAAGCC CACCTGAAAT 
TATCTCCGGC TTCCCTCTGG CCGAACAATA TCGGTAGTTA ATTAAAACTT AGGGTGCAAG 
ATCATCCACA ATGTCACCAC AACGAGACCG GATAAATGCC TTCTACAAAG ATAACCCCCA 
TCCCAAGGGA AGTAGGATAG TCATTAACAG AGAACATCTT ATGATTGATA GACCTTATGT 
TTTGCTGGCT GTTCTGTTTG TCATGTCTCT GAGCTTGATC GGGTTGCTAG CCATTGCAGG 
CATTAGACTT CATCGGGCAG CCATCTACAC CGCAGAGATC CATAAAAGCC TCAGCACCAA 
TCTAGATGTA ACTAACTCAA TCGAGCATCA GGTCAAGGAC GTGCTGACAC CACTCTTCAA 
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AATCATCGGT GATGAAGTGG GCCTGAGGAC ACCTCAGAGA TTCACTGACC TAGTGAAATT 
CATCTCTGAC AAGATTAAAT TCCTTAATCC GGATAGGGAG TACGACTTCA GAGATCTCAC 
TTGGTGTATC AACCCGCCAG AGAGAATCAA ATTGGATTAT GATCAATACT GTGCAGATGT 
GGCTGCTGAA GAGCTCATGA ATGCATTGGT GAACTCAACT CTACTGGAGA CCAGAACAAC 
CAATCAGTTC CTAGCTGTCT CAAAGGGAAA CTGCTCAGGG CCCACTACAA TCAGAGGTCA 
ATTCTCAAAC ATGTCGCTGT CCCTGTTAGA CTTGTATTTA AGTCGAGGTT ACAATGTGTC 
ATCTATAGTC ACTATGACAT CCCAGGGAAT GTATGGGGGA ACTTACCTAG TGGAAAAGCC 
TAATCTGAGC AGCAAAAGGT CAGAGTTGTC ACAACTGAGC ATGTACCGAG TGTTTGAAGT 
AGGTGTTATC AGAAATCCGG GTTTGGGGGC TCCGGTGTTC CATATGACAA ACTATCTTGA 
GCAACCAGTC AGTAATGATC TCAGCAACTG TATGGTGGCT TTGGGGGAGC TCAAACTCGC 
AGCCCTTTGT CACGGGGAAG ATTCTATCAC AATTCCCTAT CAGGGATCAG GGAAAGGTGT 
CAGCTTCCAG CTCGTCAAGC TAGGTGTCTG GAAATCCCCA ACCGACATGC AATCCTGGGT 
CCCCTTATCA ACGGATGATC CAGTGATAGA CAGGCTTTAC CTCTCATCTC ACAGAGGTGT 
TATCGCTGAC AATCAAGCAA AATGGGCTGT CCCGACAACA CGAACAGATG ACAAGTTGCG 
AATGGAGACA TGCTTCCAAC AGGCGTGTAA GGGTAAAATC CAAGCACTCT GCGAGAATCC 
CGAGTGGGCA CCATTGAAGG ATAACAGGAT TCCTTCATAC GGGGTCTTGT CTGTTGATCT 
GAGTCTGACA GTTGAGCTTA AAATCAAAAT TGCTTCGGGA TTCGGGCCAT TGATCACACA 
CGGTTCAGGG ATGGACCTAT ACAAATCCAA CCACAACAAT GTGTATTGGC TGACTATCCC 
GCCAATGAAG AACCTAGCCT TAGGTGTAAT CAACACATTG GAGTGGATAC CGAGATTCAA 
GGTTAGTCCC AACCTCTTCA CTGTCCCAAT TAAGGAAGCA GGCGAAGACT GCCATGCCCC 
AACATACCTA CCTGCGGAGG TGGATGGTGA TGTCAAACTC AGTTCCAATC TGGTGATTCT 
ACCTGGTCAA GATCTCCAAT ATGTTTTGGC AACCTACGAT ACTTCCAGGG TTGAACATGC 
TGTGGTTTAT TACGTTTACA GCCCAGGCCG CTCATTTTCT TACTTTTATC CTTTTAGGTT 
GCCTATAAAG GGGGTCCCCA TCGAATTACA AGTGGAATGC TTCACATGGG ACCAAAAACT 
CTGGTGCCGT CACTTCTGTG TGCTTGCGGA CTCAGAATCT GGTGGACATA TCACTCACTC 
TGGGATGGTG GGCATGGGAG TCAGCTGCAC AGTCACCCGG GAAGATGGAA CCAATCGCAG 
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ATAGGGCTGC TAGTGAACCA ATCACATGAT GTCACCCAGA CATCAGGCAT ACCCACTAGT 
GTGAAATAGA CATCAGAATT AAGAAAAACG TAGGGTCCAA GTGGTTCCCC GTTATGGACT 
CGCTATCTGT CAACCAGATC TTATACCCTG AAGTTCACCT AGATAGCCCG ATAGTTACCA 
ATAAGATAGT AGCCATCCTG GAGTATGCTC GAGTCCCTCA CGCTTACAGC CTGGAGGACC 
CTACACTGTG TCAGAACATC AAGCACCGCC TAAAAAACGG ATTTTCCAAC CAAATGATTA 
TAAACAATGT GGAAGTTGGG AATGTCATCA AGTCCAAGCT TAGGAGTTAT CCGGCCCACT 
CTCATATTCC ATATCCAAAT TGTAATCAGG ATTTATTTAA CATAGAAGAC AAAGAGTCAA 
CGAGGAAGAT CCGTGAACTC CTCAAAAAGG GGAATTCGCT GTACTCCAAA GTCAGTGATA 
AGGTTTTCCA ATGCTTAAGG GACACTAACT CACGGCTTGG CCTAGGCTCC GAATTGAGGG 
AGGACATCAA GGAGAAAGTT ATTAACTTGG GAGTTTACAT GCACAGCTCC CAGTGGTTTG 
AGCCCTTTCT GTTTTGGTTT ACAGTCAAGA CTGAGATGAG GTCAGTGATT AAATCACAAA 
CCCATACTTG CCATAGGAGG AGACACACAC CTGTATTCTT CACTGGTAGT TCAGTTGAGT 
TGCTAATCTC TCGTGACCTT GTTGCTATAA TCAGTAAAGA GTCTCAACAT GTATATTACC 
TGACATTTGA ACTGGTTTTG ATGTATTGTG ATGTCATAGA GGGGAGGTTA ATGACAGAGA 
CCGCTATGAC TATTGATGCT AGGTATACAG AGCTTCTAGG AAGAGTCAGA TACATGTGGA 
AACTGATAGA TGGTTTCTTC CCTGCACTCG GGAATCCAAC TTATCAAATT GTAGCCATGC 
TGGAGCCTCT TTCACTTGCT TACCTGCAGC TGAGGGATAT AACAGTAGAA CTCAGAGGTG 
CTTTCCTTAA CCACTGCTTT ACTGAAATAC ATGATGTTCT TGACCAAAAC GGGTTTTCTG 
ATGAAGGTAC TTATCATGAG TTAATTGAAG CTCTAGATTA CATTTTCATA ACTGATGACA 
TACATCTGAC AGGGGAGATT TTCTCATTTT TCAGAAGTTT CGGCCACCCC AGACTTGAAG 
CAGTAACGGC TGCTGAAAAT GTTAGGAAAT ACATGAATCA GCCTAAAGTC ATTGTGTATG 
AGACTCTGAT GAAAGGTCAT GCCATATTTT GTGGAATCAT AATCAACGGC TATCGTGACA 
GGCACGGAGG CAGTTGGCCA CCGCTGACCC TCCCCCTGCA TGCTGCAGAC ACAATCCGGA 
ATGCTCAAGC TTCAGGTGAA GGGTTAACAC ATGAGCAGTG CGTTGATAAC TGGAAATCTT 
TTGCTGGAGT GAAATTTGGC TGCTTTATGC CTCTTAGCCT GGATAGTGAT CTGACAATGT 
ACCTAAAGGA CAAGGCACTT GCTGCTCTCC AAAGGGAATG GGATTCAGTT TACCCGAAAG 
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AGTTCCT6CG TTACGACCCT CCCAAGGGAA CCG66TCAC6 GAGGCTTGTA GATGTTTTCC 
TTAATGATTC GAGCTTTGAC CCATATGATG TGATAATGTA TGTTGTAAGT GGAGCTTACC 
TCCATGACCC TGAGTTCAAC CTGTCTTACA GCCTGAAAGA AAAGGAGATC AAGGAAACAG 
GTAGACTTTT TGCTAAAATG ACTTACAAAA TGAGGGCATG CCAAGTGATT GCTGAAAATC 
TAATCT CAAA CGGGATTGGC AAATATTTTA AGGACAATGG GATGGCCAAG GATGAGCACG 
ATTTGACTAA GGCACTCCAC ACTCTAGCTG TCTCAGGAGT CCCCAAAGAT CTCAAAGAAA 
GTCACAGGGG GGGGCCAGTC TTAAAAACCT ACTCCCGAAG CCCAGTCCAC ACAAGTACCA 
GGAACGTGAG AGCAGCAAAA GGGTTTATAG GGTTCCCTCA AGTAATTCGG CAGGACCAAG 
ACACTGATCA TCCGGAGAAT ATGGAAGCTT ACGAGACAGT CAGTGCATTT ATCACGACTG 
ATCTCAAGAA GTACTGCCTT AATTGGAGAT ATGAGACCAT CAGCTTGTTT GCACAGAGGC 
TAAATGAGAT TTACGGATTG CCCTCATTTT TCCAGTGGCT GCATAAGAGG CTTGAGACCT 
CTGTCCTGTA TGTAAGTGAC CCTCATTGCC CCCCCGACCT TGACGCCCAT ATCCCGTTAT 
ATAAAGTCCC CAATGATCAA ATCTTCATTA AGTACCCTAT GGGAGGTATA GAAGGGTATT 
GTCAGAAGCT GTGGACCATC AGCACCATTC CCTATCTATA CCTGGCTGCT TATGAGAGCG 
GAGTAAGGAT TGCTTCGTTA GTGCAAGGGG ACAATCAGAC CATAGCCGTA ACAAAAAGGG 
TACCCAGCAC ATGGCCCTAC AACCTTAAGA AACGGGAAGC TGCTAGAGTA ACTAGAGATT 
ACTTTGTAAT TCTTAGGCAA AGGCTACATG ATATTGGCCA TCACCTCAAG GCAAATGAGA 
CAATTGTTTC ATCACATTTT TTTGTCTATT CAAAAGGAAT ATATTATGAT GGGCTACTTG 
TGTCCCAATC ACTCAAGAGC ATCGCAAGAT GTGTATTCTG GTCAGAGACT ATAGTTGATG 
AAACAAGGGC AGCATGCAGT AATATTGCTA CAACAATGGC TAAAAGCATC GAGAGAGGTT 
ATGACCGTTA CCTTGCATAT TCCCTGAACG TCCTAAAAGT GATACAGCAA ATTCTGATCT 
CTCTTGGCTT CACAATCAAT TCAACCATGA CCCGGGATGT AGTCATACCC CTCCTCACAA 
ACAACGACCT CTTAATAAGG ATGGCACTGT TGCCCGCTCC TATTGGGGGG ATGAATTATC 
TGAATATGAG CAGGCTGTTT GTCAGAAACA TCGGTGATCC AGTAACATCA TCAATTGCTG 
ATCTCAAGAG AATGATTCTC GCCTCACTAA TGCCTGAAGA GACCCTCCAT CAAGTAATGA 
CACAACAACC GGGGGACTCT TCATTCCTAG ACTGGGCTAG CGACCCTTAC TCAGCAAATC 
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TTGTATGTGT CCAGAGCATC ACTAGACTCC TCAAGAACAT AACTGCAAGG TTTGTCCTGA 
TCCATAGTCC AAACCCAATG TTAAAAGGAT TATTCCATGA TGACAGTAAA GAAGAGGACG 
AGGGACTGGC GGCATTCCTC ATGGACAGGC ATATTATAGT ACCTAGGGCA GCTCATGAAA 
TCCTGGATCA TAGTGTCACA GGGGCAAGAG AGTCTATTGC AGGCATGCTG GATACCACAA 
AAGGCCTGAT TCGAGCCAGC ATGAGGAAGG GGGGGTTAAC CTCTCGAGTG ATAACCAGAT 
TGTCCAATTA TGACTATGAA CAATTCAGAG CAGGGATGGT GCTATTGACA GGAAGAAAGA 
GAAATGTCCT CATTGACAAA GAGTCATGTT CAGTGCAGCT GGCGAGAGCT CTAAGAAGCC 
ATATGTGGGC GAGGCTAGCT CGAGGACGGC CTATTTACGG CCTTGAGGTC CCTGATGTAC 
TAGAATCTAT GCGAGGCCAC CTTATTCGGC GTCATGAGAC ATGTGTCATC TGCGAGTGTG 
GATCAGTCAA CTACGGATGG TTTTTTGTCC CCTCGGGTTG CCAACTGGAT GATATTGACA 
AGGAAACATC ATCCTTGAGA GTCCCATATA TTGGTTCTAC CACTGATGAG AGAACAGACA 
TGAAGCTTGC CTTCGTAAGA GCCCCAAGTC GATCCTTGCG ATCTGCTGTT AGAATAGCAA 
CAGTGTACTC ATGGGCTTAC GGTGATGATG ATAGCTCTTG GAACGAAGCC TGGTTGTTGG 
CTAGGCAAAG GGCCAATGTG AGCCTGGAGG AGCTAAGGGT GATCACTCCC ATCTCAACTT 
CGACTAATTT AGCGCATAGG TTGAGGGATC GTAGCACTCA AGTGAAATAC TCAGGTACAT 
CCCTTGTCCG AGTGGCGAGG TATACCACAA TCTCCAACGA CAATCTCTCA TTTGTCATAT 
CAGATAAGAA GGTTGATACT AACTTTATAT ACCAACAAGG AATGCTTCTA GGGTTGGGTG 
TTTTAGAAAC ATTGTTTCGA CTCGAGAAAG ATACCGGATC ATCTAACACG GTATTACATC 
TTCACGTCGA AACAGATTGT TGCGTGATCC CGATGATAGA TCATCCCAGG ATACCCAGCT 
CCCGCAAGCT AGAGCTGAGG GCAGAGCTAT GTACCAACCC ATTGATATAT GATAATGCAC 
CTTTAATTGA CAGAGATGCA ACAAGGCTAT ACACCCAGAG CCATAGGAGG CACCTTGTGG 
AATTTGTTAC ATGGTCCACA CCCCAACTAT ATCACATTTT AGCTAAGTCC ACAGCACTAT 
CTATGATTGA CCTGGTAACA AAATTTGAGA AGGACCATAT GAATGAAATT TCAGCTCTCA 
TAGGGGATGA CGATATCAAT AGTTTCATAA CTGAGTTTCT GCTCATAGAG CCAAGATTAT 
TCACTATCTA CTTGGGCCAG TGTGCGGCCA TCAATTGGGC ATTTGATGTA CATTATCATA 
GACCATCAGG GAAATATCAG ATGGGTGAGC TGTTGTCATC GTT CCTTTCT AGAATGAGCA 
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AAGGAGTGTT TAAGGTGCTT GTCAATGCTC TAAGCCACCC AAAGATCTAC AAGAAATTCT 
GGCATTGTGG TATTATAGAG CCTATCCATG GTCCTTCACT TGATGCTCAA AACTTGCACA 
CAACTGTGTG CAACATGGTT TACACATGCT ATATGACCTA CCTCGACCTG TTGTTGAATG 
AAGAGTTAGA AGAGTTCACA TTTCTCTTGT GTGAAAGCGA CGAGGATGTA GTACCGGACA 
GATTCGACAA CATCCAGGCA AAACACTTAT GTGTTCTGGC AGATTTGTAC TGTCAACCAG 
GGACCTGCCC ACCAATTCGA GGTCTAAGAC CGGTAGAGAA ATGTGCAGTT CTAACCGACC 
ATATCAAGGC AGAGGCTAGG TTATCTCCAG CAGGATCTTC GTGGAACATA AATCCAATTA 
TTGTAGACCA TTACTCATGC TCTCTGACTT ATCTCCGGCG AGGATCGATC AAACAGATAA 
GATTGAGAGT TGATCCAGGA TTCATTTTCG ACGCCCTCGC TGAGGTAAAT GTCAGTCAGC 
CAAAGATCGG CAGCAACAAC ATCTCAAATA TGAGCATCAA GGATTTCAGA CCCCCACACG 
ATGATGTTGC AAAATTGCTC AAAGATATCA ACACAAGCAA GCACAATCTT CCCATTTCAG 
GGGGCAATCT CGCCAATTAT GAAATCCATG CTTTCCGCAG AATCGGGTTG AACTCATCTG 
CTTGCTACAA AGCTGTTGAG ATATCAACAT TAATTAGGAG ATGCCTTGAG CCAGGGGAAG 
ACGGCTTGTT CTTGGGTGAG GGATCGGGTT CTATGTTGAT CACTTATAAG GAGATACTTA 
AACTAAACAA GTGCTTCTAT AATAGTGGGG TTTCCGCCAA TTCTAGATCT GGTCAAAGGG 
AATTAGCACC CTATCCCTCC GAAGTTGGCC TTGTCGAACA CAGAATGGGA GTAGGTAATA 
TTGTCAAAGT GCTCTTTAAC GGGAGGCCCG AAGTCACGTG GGTAGGCAGT GTAGATTGCT 
TCAATTTCAT AGTTAGTAAT ATCCCTACCT CTAGTGTGGG GTTTATCCAT TCAGATATAG 
AGACCTTGCC TAACAAAGAT ACTATAGAGA AGCTAGAGGA ATTGGCAGCC ATCTTATCGA 
TGGCTCTGCT CCTGGGCAAA ATAGGATCAA TACTGGTGAT TAAGCTTATG CCTTTCAGCG 
GGGATTTTGT TCAGGGATTT ATAAGTTATG TAGGGTCTCA TTATAGAGAA GTGAACCTTG 
TATACCCTAG ATACAGCAAC TTCATATCTA CTGAATCTTA TTTGGTTATG ACAGATCTCA 
AGGCTAACCG GCTAATGAAT CCTGAAAAGA TTAAGCAGCA GATAATTGAA TCATCTGTGA 
GGACTTCACC TGGACTTATA GGTCACATCC TATCCATTAA GCAACTAAGC TGCATACAAG 
CAATTGTGGG AGACGCAGTT AGTAGAGGTG ATATCAATCC TACTCTGAAA AAACTTACAC 
CTATAGAGCA GGTGCTGATC AATTGCGGGT TGGCAATTAA CGGACCTAAG CTGTG CAAAG 
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AATTGATCCA CCATGATGTT GCCTCAGGGC AAGATGGATT 
TCTACAGGGA GTTGGCAAGA TTCAAAGACA ACCAAAGAAG 
CTTACCCCGT ATTGGTAAGT AGCAGGCAAC GAGAACTTAT 
TTTGGGGGCA CATTCTTCTT TACTCCGGGA ACAGAAAGTT 
ATCTCAAGTC CGGCTATCTG ATACTAGACT TACACCAGAA 
CCAAGTCAGA GAAACAGATT ATTATGACGG GGGGTTTGAA 
TAACAGTCAA GGAGACCAAA GAATGGTATA AGTTAGTCGG 
ACTAATTGGT TGAACTCCGG AACCCTAATC CTGCCCTAGG 
TATATTAAAG AAAACTTTGA AAATACGAAG TTTCTATTCC 
(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2183 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



GCTTAATTCT 
TCAACAAGGG 
ATCTAGGATC 
GATAAATAAG 
TATCTTCGTT 
ACGTGAGTGG 
ATACAGTGCC 
TGGTTAGGCA 
CAGCTTTGTC 



ATACTCATCC 
ATGTTCCACG 
ACCCGCAAAT 
TTTATCCAGA 
AAGAATCTAT 
GTTTTTAAGG 
CTGATTAAGG 
TTATTTGCAA 
TGGT 
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<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 

Met Asp Ser Leu Ser Val Asn Gin He Leu Tyr Pro Glu Val His Leu 
15 10 15 

Asp Ser Pro He Val Thr Asn Lys He Val Ala He Leu Glu Tyr Ala 
20 25 30 

Arg Val Pro His Ala Tyr Ser Leu Glu Asp Pro Thr Leu Cys Gin Asn 
35 40 45 

He Lys His Arg Leu Lys Asn Gly Phe Ser Asn Gin Met He He Asn 
50 55 60 

Asn Val Glu Val Gly Asn Val He Lys Ser Lys Leu Arg Ser Tyr Pro 
65 70 75 80 

Ala His Ser His He Pro Tyr Pro Asn Cys Asn Gin Asp Leu Phe Asn 
65 90 95 
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He Glu Asp Lys Glu Ser Thr Arg Lys He Arg Glu Leu Leu Lys Lys 
100 105 110 

Gly Asn Ser Leu Tyr Ser Lys Val Ser Asp Lys Val Phe Gin Cys Leu 
115 120 125 

Arg Asp Thr Asn Ser Arg Leu Gly Leu Gly Ser Glu Leu Arg Glu Asp 
130 135 140 

He Lys Glu Lys Val He Asn Leu Gly Val Tyr Met His Ser Ser Gin 
145 150 155 160 

Trp Phe Glu Pro Phe Leu Phe Trp Phe Thr Val Lys Thr Glu Met Arg 
165 170 175 

Ser Val He Lys Ser Gin Thr His Thr Cys His Arg Arg Arg His Thr 
180 185 190 

Pro Val Phe Phe Thr Gly Ser Ser Val Glu Leu Leu He Ser Arg Asp 
195 200 205 

Leu Val Ala He He Ser Lys Glu Ser Gin His Val Tyr Tyr Leu Thr 
210 215 220 

Phe Glu Leu Val Leu Met Tyr Cys Asp Val He Glu Gly Arg Leu Met 
225 230 235 240 

Thr Glu Thr Ala Met Thr He Asp Ala Arg Tyr Thr Glu Leu Leu Gly 
245 250 255 

Arg Val Arg Tyr Met Trp Lys Leu He Asp Gly Phe Phe Pro Ala Leu 
260 265 270 

Gly Asn Pro Thr Tyr Gin He Val Ala Met Leu Glu Pro Leu Ser Leu 
275 280 285 

Ala Tyr Leu Gin Leu Arg Asp He Thr Val Glu Leu Arg Gly Ala Phe 
290 295 300 

Leu Asn His Cys Phe Thr Glu He His Asp Val Leu Asp Gin Asn Gly 
305 310 315 320 

Phe Ser Asp Glu Gly Thr Tyr His Glu Leu He Glu Ala Leu Asp Tyr 
325 330 335 

He Phe He Thr Asp Asp He His Leu Thr Gly Glu He Phe Ser Phe 
340 345 350 

Phe Arg Ser Phe Gly His Pro Arg Leu Glu Ala Val Thr Ala Ala Glu 
355 360 365 

Asn Val Arg Lys Tyr Met Asn Gin Pro Lys Val He Val Tyr Glu Thr 
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370 375 380 

Leu Met Lys Gly His Ala He Phe Cys Gly He He He Asn Gly Tyr 
385 390 395 400 

Arg Asp Arg His Gly Gly Ser Trp Pro Pro Leu Thr Leu Pro Leu Hie 
405 410 415 

Ala Ala Asp Thr He Arg Asn Ala Gin Ala Ser Gly Glu Gly Leu Thr 
420 425 430 

His Glu Gin Cys Val Asp Asn Trp Lys Ser Phe Ala Gly Val Lys Phe 
435 440 445 

Gly Cys Phe Met Pro Leu Ser Leu Asp Ser Asp Leu Thr Met Tyr Leu 
450 455 460 

Lys Asp Lys Ala Leu Ala Ala Leu Gin Arg Glu Trp Asp Ser Val Tyr 
465 470 475 480 

Pro Lys Glu Phe Leu Arg Tyr Asp Pro Pro Lys Gly Thr Gly Ser Arg 
485 490 495 

Arg Leu Val Asp Val Phe Leu Asn Asp Ser Ser Phe Asp Pro Tyr Asp 
500 505 510 

Val He Met Tyr Val Val Ser Gly Ala Tyr Leu His Asp Pro Glu Phe 
515 520 525 

Asn Leu Ser Tyr Ser Leu Lys Glu Lys Glu He Lys Glu Thr Gly Arg 
530 535 540 

Leu Phe Ala Lys Met Thr Tyr Lys Met Arg Ala Cys Gin Val He Ala 
545 550 555 560 

Glu Asn Leu He Ser Asn Gly He Gly Lys Tyr Phe Lys Asp Asn Gly 
565 570 575 

Met Ala Lys Asp Glu His Asp Leu Thr Lys Ala Leu His Thr Leu Ala 
580 585 590 

Val Ser Gly Val Pro Lys Asp Leu Lys Glu Ser His Arg Gly Gly Pro 
595 600 605 

Val Leu Lys Thr Tyr Ser Arg Ser Pro Val His Thr Ser Thr Arg Asn 
610 615 620 

Val Arg Ala Ala Lys Gly Phe He Gly Phe Pro Gin Val He Arg Gin 
625 630 635 640 

Asp Gin Asp Thr Asp His Pro Glu Asn Met Glu Ala Tyr Glu Thr Val 
645 650 655 
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Ser Ala Phe lie Thr Thr Asp Leu Lys Lys Tyr Cys Leu Asn Trp Arg 
660 665 670 

Tyr Glu Thr lie Ser Leu Phe Ala Gin Arg Leu Asn Glu lie Tyr Gly 
675 680 685 

Leu Pro Ser Phe Phe Gin Trp Leu His Lys Arg Leu Glu Thr Ser Val 
690 695 700 

Leu Tyr Val Ser Asp Pro His Cys Pro Pro Asp Leu Asp Ala His He 
705 710 715 720 

Pro Leu Tyr Lys Val Pro Asn Asp Gin He Phe He Lys Tyr Pro Met 
725 730 735 

Gly Gly He Glu Gly Tyr Cys Gin Lys Leu Trp Thr He Ser Thr He 
740 745 750 

Pro Tyr Leu Tyr Leu Ala Ala Tyr Glu Ser Gly Val Arg He Ala Ser 
755 760 765 

Leu Val Gin Gly Asp Asn Gin Thr He Ala Val Thr Lys Arg Val Pro 
770 775 780 

Ser Thr Trp Pro Tyr Asn Leu Lys Lys Arg Glu Ala Ala Arg Val Thr 
785 790 795 800 

Arg Asp Tyr Phe Val He Leu Arg Gin Arg Leu His Asp He Gly His 
805 810 815 

His Leu Lys Ala Asn Glu Thr He Val Ser Ser His Phe Phe Val Tyr 
820 825 830 

Ser Lys Gly He Tyr Tyr Asp Gly Leu Leu Val Ser Gin Ser Leu Lys 
835 840 845 

Ser He Ala Arg CyB Val Phe Trp Ser Glu Thr He Val Asp Glu Thr 
850 855 860 

Arg Ala Ala Cys Ser Asn He Ala Thr Thr Met Ala Lys Ser He Glu 
865 870 875 880 

Arg Gly Tyr Asp Arg Tyr Leu Ala Tyr Ser Leu Asn Val Leu Lys Val 
885 890 895 

He Gin Gin He Leu He Ser Leu Gly Phe Thr He Asn Ser Thr Met 
900 905 910 

Thr Arg Asp Val Val He Pro Leu Leu Thr Asn Asn Asp Leu Leu He 
915 920 925 
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Arg Met Ala Leu Leu Pro Ala Pro 
930 935 

Met Ser Arg Leu Phe Val Arg Asn 
945 950 

lie Ala Asp Leu Lys Arg Met lie 
965 

Thr Leu Hie Gin Val Met Thr Gin 
980 



lie Gly Gly Met Asn Tyr Leu Asn 
940 

lie Gly Asp Pro Val Thr Ser Ser 
955 960 

Leu Ala Ser Leu Met Pro Glu Glu 
970 975 

Gin Pro Gly Asp Ser Ser Phe Leu 
985 990 



Asp Trp Ala Ser Asp Pro Tyr Ser Ala Asn Leu Val Cys Val Gin Ser 
995 1000 1005 

lie Thr Arg Leu Leu Lys Asn lie Thr Ala Arg Phe Val Leu lie His 
1010 1015 1020 

Ser Pro Asn Pro Met Leu Lys Gly Leu Phe His Asp Asp Ser Lys Glu 
1025 1030 1035 1040 

Glu Asp Glu Gly Leu Ala Ala Phe Leu Met Asp Arg His lie lie Val 
1045 1050 1055 

Pro Arg Ala Ala His Glu lie Leu Asp His Ser Val Thr Gly Ala Arg 
1060 1065 1070 

Glu Ser lie Ala Gly Met Leu Asp Thr Thr Lys Gly Leu lie Arg Ala 
1075 1080 1085 

Ser Met Arg Lys Gly Gly Leu Thr Ser Arg Val lie Thr Arg Leu Ser 
1090 1095 1100 

Asn Tyr Asp Tyr Glu Gin Phe Arg Ala Gly Met Val Leu Leu Thr Gly 
1105 1110 1115 1120 

Arg Lys Arg Asn Val Leu lie Asp Lys Glu Ser Cys Ser Val Gin Leu 
1125 1130 1135 

Ala Arg Ala Leu Arg Ser His Met Trp Ala Arg Leu Ala Arg Gly Arg 
1140 1145 1150 

Pro lie Tyr Gly Leu Glu Val Pro Asp Val Leu Glu Ser Met Arg Gly 
1155 1160 1165 

His Leu He Arg Arg His Glu Thr Cys Val He Cys Glu Cys Gly Ser 
1170 1175 1180 

Val Asn Tyr Gly Trp Phe Phe Val Pro Ser Gly Cys Gin Leu Asp Asp 
1185 1190 1195 1200 

He Asp Lys Glu Thr Ser Ser Leu Arg Val Pro Tyr He Gly Ser Thr 
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1205 1210 1215 

Thr Asp Glu Arg Thr Asp Met Lys Leu Ala Phe Val Arg Ala Pro Ser 
1220 1225 1230 

Arg Ser Leu Arg Ser Ala Val Arg He Ala Thr Val Tyr Ser Trp Ala 
1235 1240 1245 

Tyr Gly Asp Asp Asp Ser Ser Trp Asn Glu Ala Trp Leu Leu Ala Arg 
1250 1255 1260 

Gin Arg Ala Asn Val Ser Leu Glu Glu Leu Arg Val He Thr Pro He 
1265 1270 1275 1280 

Ser Thr Ser Thr Asn Leu Ala His Arg Leu Arg Asp Arg Ser Thr Gin 
1285 1290 1295 

Val Lys Tyr Ser Gly Thr Ser Leu Val Arg Val Ala Arg Tyr Thr Thr 
1300 1305 1310 

He Ser Asn Asp Asn Leu Ser Phe Val He Ser Asp Lys Lys Val Asp 
1315 1320 1325 

Thr Asn Phe He Tyr Gin Gin Gly Met Leu Leu Gly Leu Gly Val Leu 
1330 1335 1340 

Glu Thr Leu Phe Arg Leu Glu Lys Asp Thr Gly Ser Ser Asn Thr Val 
1345 1350 1355 1360 

Leu His Leu His Val Glu Thr Asp Cys Cys Val He Pro Met He Asp 
1365 1370 1375 

His Pro Arg He Pro Ser Ser Arg Lys Leu Glu Leu Arg Ala Glu Leu 
1380 1385 1390 

Cys Thr Asn Pro Leu He Tyr Asp Asn Ala Pro Leu He Asp Arg Asp 
1395 1400 1405 

Ala Thr Arg Leu Tyr Thr Gin Ser His Arg Arg His Leu Val Glu Phe 
1410 1415 1420 

Val Thr Trp Ser Thr Pro Gin Leu Tyr His He Leu Ala Lys Ser Thr 
1425 1430 1435 1440 

Ala Leu Ser Met He Asp Leu Val Thr Lys Phe Glu Lys Asp His Met 
1445 1450 1455 

Asn Glu He Ser Ala Leu He Gly Asp Asp Asp He Asn Ser Phe He 
1460 1465 1470 

Thr Glu Phe Leu Leu He Glu Pro Arg Leu Phe Thr He Tyr Leu Gly 
1475 1480 1485 
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Gin Cys Ala Ala He Asn Trp Ala Phe Asp Val His Tyr His Arg Pro 
1490 1495 1500 

Ser Gly Lys Tyr Gin Met Gly Glu Leu Leu Ser Ser Phe Leu Ser Arg 
1505 1510 1515 1520 

Met Ser Lys Gly Val Phe Lys Val Leu Val Asn Ala Leu Ser His Pro 
1525 1530 1535 

Lys He Tyr Lys Lys Phe Trp His Cys Gly He He Glu Pro He His 
1540 1545 1550 

Gly Pro Ser Leu Asp Ala Gin Asn Leu His Thr Thr Val Cys Asn Met 
1555 1560 1565 

Val Tyr Thr Cys Tyr Met Thr Tyr Leu Asp Leu Leu Leu Asn Glu Glu 
1570 1575 1580 

Leu Glu Glu Phe Thr Phe Leu Leu Cys Glu Ser Asp Glu Asp Val Val 
1585 1590 1595 1600 

Pro Asp Arg Phe Asp Asn He Gin Ala Lys His Leu Cys Val Leu Ala 



Asp Leu Tyr Cys Gin Pro Gly Thr Cys Pro Pro He Arg Gly Leu Arg 
1620 1625 1630 

Pro Val Glu Lys Cys Ala Val Leu Thr Asp His He Lys Ala Glu Ala 
1635 1640 1645 

Arg Leu Ser Pro Ala Gly Ser Ser Trp Asn He Asn Pro He He Val 
1650 1655 1660 

Asp His Tyr Ser Cys Ser Leu Thr Tyr Leu Arg Arg Gly Ser He Lys 
1665 1670 1675 1680 

Gin He Arg Leu Arg Val Asp Pro Gly Phe He Phe Asp Ala Leu Ala 
1685 1690 1695 

Glu Val Asn Val Ser Gin Pro Lys He Gly Ser Asn Asn He Ser Asn 
1700 1705 1710 

Met Ser He Lys Asp Phe Arg Pro Pro His Asp Asp Val Ala Lys Leu 
1715 1720 1725 

Leu Lys Asp He Asn Thr Ser Lys His Asn Leu Pro He Ser Gly Gly 
1730 1735 1740 

Asn Leu Ala Asn Tyr Glu He His Ala Phe Arg Arg He Gly Leu Asn 
1745 1750 1755 1760 



1605 



1610 



1615 
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Ser Ser Ala Cys Tyr Lye Ala Val Glu He Ser Thr Leu He Arg Arg 
1765 1770 1775 

Cys Leu Glu Pro Gly Glu Asp Gly Leu Phe Leu Gly Glu Gly Ser Gly 
1780 1785 1790 

Ser Met Leu He Thr Tyr Lys Glu He Leu Lys Leu Asn Lys Cys Phe 
1795 1800 1805 

Tyr Asn Ser Gly Val Ser Ala Asn Ser Arg Ser Gly Gin Arg Glu Leu 
1810 1815 1820 

Ala Pro Tyr Pro Ser Glu Val Gly Leu Val Glu His Arg Met Gly Val 
1825 1830 1835 1840 

Gly Asn He Val Lys Val Leu Phe Asn Gly Arg Pro Glu Val Thr Trp 
1845 1850 1855 

Val Gly Ser Val Asp Cys Phe Asn Phe He Val Ser Asn He Pro Thr 
1860 1865 1870 

Ser Ser Val Gly Phe He His Ser Asp He Glu Thr Leu Pro Asn Lys 
1875 1880 1885 

Asp Thr He Glu Lys Leu Glu Glu Leu Ala Ala He Leu Ser Met Ala 
1890 1895 1900 

Leu Leu Leu Gly Lys He Gly Ser He Leu Val He Lys Leu Met Pro 
1905 1910 1915 1920 

Phe Ser Gly Asp Phe Val Gin Gly Phe He Ser Tyr Val Gly Ser His 
1925 1930 1935 

Tyr Arg Glu Val Asn Leu Val Tyr Pro Arg Tyr Ser Asn Phe He Ser 
1940 1945 1950 

Thr Glu Ser Tyr Leu Val Met Thr Asp Leu Lys Ala Asn Arg Leu Met 
1955 1960 1965 

Asn Pro Glu Lys He Lys Gin Gin He He Glu Ser Ser Val Arg Thr 
1970 1975 1980 

Ser Pro Gly Leu He Gly His He Leu Ser He Lys Gin Leu Ser Cys 
1985 1990 1995 2000 

He Gin Ala He Val Gly Asp Ala Val Ser Arg Gly Asp He Asn Pro 
2005 2010 2015 

Thr Leu Lys Lys Leu Thr Pro He Glu Gin Val Leu He Asn Cys Gly 
2020 2025 2030 

Leu Ala He Asn Gly Pro Lys Leu Cys Lys Glu Leu He His His Asp 
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2035 



2040 



2045 



Val Ala Ser Gly Gin Asp Gly Leu Leu Asn Ser He Leu He Leu Tyr 
2050 2055 2060 



Arg Glu Leu Ala Arg Phe Lys Asp Asn Gin Arg Ser Gin Gin Gly Met 
2065 2070 2075 2080 



Phe His Ala Tyr Pro Val Leu Val Ser Ser Arg Gin Arg Glu Leu He 
2085 2090 2095 



Ser Arg He Thr Arg Lys Phe Trp Gly His He Leu Leu Tyr Ser Gly 
2100 2105 2110 



Asn Arg Lys Leu He Asn Lys Phe He Gin Asn Leu Lys Ser Gly Tyr 
2115 2120 2125 



Leu He Leu Asp Leu His Gin Asn He Phe Val Lys Asn Leu Ser Lys 
2130 2135 2140 



Ser Glu Lys Gin He He Met Thr Gly Gly Leu Lys Arg Glu Trp Val 
2145 2150 2155 " 2160 

Phe Lys Val Thr Val Lys Glu Thr Lys Glu Trp Tyr Lys Leu Val Gly 
2165 2170 2175 

Tyr Ser Ala Leu He Lys Asp 
2180 

(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15894 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 

ACCAAACAAA GTTGGGTAAG GATAGATCAA TCAATGATCA TATTCTAGTA CACTTAGGAT 60 

TCAAGATCCT ATTATCAGGG ACAAGAGCAG GATTAGGGAT ATCCGAGATG GCCACACTTC 120 

TAAGGAGCTT AGCATTGTTC AAAAGAAACA AGGACAAACC ACCCATTACA TCAGGATCCG 180 

GTGGAGCCAT CAGAGGAATC AAACACATTA TTATAGTACC AATCCCGGGA GATTCCTCAA 240 
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TTACCACTCG ATCTAGACTT CTGGACCGGT TGGTCAGGTT AATTGGAAAC CCGGATGTGA 
GCGGGCCCAA ACTAACAGGG GCACTAATAG GTATATTATC CTTATTTGTG GAGTCTCCAG 
GTCAATTGAT TCAGAGGATC ACCGATGACC CTGACGTTAG CATAAGGCTG TTAGAGGTTG 
TCCAGAGTGA CCAGTCACAA TCTGGCCTTA CCTTCGCATC AAGAGGTACC AACATGGAGG 
ATGAGGCGGA CCAATATTTT TCACATGATG ATCCAAGTAG TAGTGATCAA TCCAGGTTCG 
GATGGTTCGA GAACAAGGAA ATCTCAGATA TTGAAGTGCA AGACCCTGAG GGATTCAACA 
TGATTCTGGG TACCATCCTA GCTCAAATTT GGGTCTTGCT CGCAAAGGCG GTTACGGCCC 
CAGACACGGC AGCTGATTCG GAGCTAAGAA GGTGGATAAA GTACACCCAA CAAAGAAGGG 
TAGTTGGTGA ATTTAGATTG GAGAGAAAAT GGTTGGATGT GGTGAGGAAC AGGATTGCCG 
AGGACCTCTC CTTACGCCGA TTCATGGTCG CTCTAATCCT GGATATCAAG AGAACACCCG 
GGAACAAACC CAGGATTGCT GAAATGATAT GTGACATTGA TACATATATC GTAGAGGCAG 
GATTAGCCAG TTTTATCCTG ACTATTAAGT TTGGGATAGA AACTATGTAT CCTGCTCTTG 
GACTGCATGA ATTTGCTGGT GAGTTATCCA CACTTGAGTC CTTGATGAAT CTTTACCAGC 
AAATGGGGGA AACTGCACCA TACATGGTAA TCCTGGAGAA CTCAATTCAG AACAAGTTCA 
GTGCAGGATC ATACCCTCTG CTCTGGAGCT ATGCCATGGG AGTAGGAGTG GAACTTGAAA 
ACTCCATGGG AGGTTTGAAC TTTGGCCGAT CTTACTTCGA TCCAGCATAT TTCAGACTAG 
GGCAAGAGAT GGTGAGGAGG TCAGCTGGAA AGGTCAGTTC CACATTGGCA TCTGAACTCG 
GTATCACTGC CGAAGATGCA AGGCTTGTTT CAGAGATCGC AATGCATACT ACAGAGGACA 
GGATCAGTAG AGCGGTTGGA CCCAGACAAT CCCAAGTGTC ATTCCTACAC GGTGATCAAA 
ATGAAAATGA GCTACCGAGA TGGGGGGGTA AGGAAGATAT GAGGGTCAAA CAGAGTCGGG 
GAGAAGCCAG AGAGAGCTAC AGAGAAACCA GGCCCAGCAG AGCAAGTGAC GCGAGAGCTA 
CCCATCCTCC AACCGACACA CCCTTAGACA TTGACACTGC ATCGGAGTCC AGCCAAGATC 
CGCAGGACAG TCGAAGGTCA GCTGACGCCC TGCTCAGGCT GCAAGCCATG GCAGGAATCT 
CGGAAGAACA AGGCTCAGAC ACGGACACCC CTAGAGTGTA CAATGACAGA GATCTTCTAG 
ACTAGGTGCA AGAGGCCGAG GACCAGAACA ACATCCGCCT ACCCTCCATC ATTGTTATAA 
AAAACTTAGG AACCAGGTCC ACACAGCCGC CAGCCCACCA ACCATCCACT CCCACGATTG 
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6GGCCGAT66 CAGAAGAGCA GGCACGCCAT GTCAAAAACG GACTGGAATG CATCCGGGCT 
CTCAAGGCCG AGCCCATCGG CTCACTGGCC ATCGAGGAAG CTATGGCAGC ATGGTCAGAA 
ATATCAGACA ACCCAGGACA GGAGCGAGCC GCCTGCAAGG AAGAGAAGGC AAGCAGTCCG 
GGTCTCAGCA AACCATGCCT CTCAGCAATT GGATCAACTG AAGGCGGTGC ACCTCGCATC 
CGCGGTCAGG GATCTGGAGA GAGCGATGAC GACGCTGAAA CTTTGGGAAT CCCCTCAGGA 
AATCTCCAGG CATCAAGCAC TGGGTTACAG TGTTATTATG TTTATGATCA CAGCGGTGAA 
GCGGTTAAGG GAATCCAAGA TGCTGACTCT ATCATGGTTC AATCAGGCCT TGATGGTGAT 
AGCACCCTCT CAGGAGGAGA CAATGAATCT GAAAACAGCG ATGTGGATAT TGGCGAACCT 
GATACCGAGG GATATGCTAT CACTGACCGG GGATCTGCTC CCATCTCTAT GGGGTTCAGG 
GCTTCTGATG TTGAAACTGC AGAAGGAGGG GAGATCCACG AGCTCCTGAG ACTCCAATCC 
AGAGGCAACA ACTTTCCAAA GCTTAGGAAA ACTCTCAATG TTCCCCCGCC CCCGGACCCT 
GGTAGGGCCA GCACTTCCGA GACACCCATT AAAAAGGGCA CAGACGCGAG ATTAGCCTCA 
TTTGGAACGG AGATCGCGTC TTTATTGACA GGTGGTGCAA CCCAATGTGC TCGAAAGTCA 
CCCTCGGAAC CATCAGGGCC AGGTGCACCT GCGGGGAATG TCCCCGAGTG TGTGAGCAAT 
GCCGTACTGA TACAGGAGTG GACACCCGAA TCTGGTACCA CAATCTCCCC GAGATCCCAG 
AATAATGAAG AAGGGGGAGA TTATTATGAT GATGAGCTGT TCTCTGATGT CCAAGATATT 
AAAACAGCCT TGGCCAAAAT ACACGAGGAT AATCAGAAGA TAATCACCAA GCTAGAATCA 
CTGCTGTTAT TGAAGGGGGA AGTTGAGTCA ATCAAGAAGC AGATCAACAG GCAAAATATC 
AGCATATCCA CCTTGGAAGG ACACCTCTCA AGCATCATGA TCGCCATTCC TGGACTTGGG 
AAGGATCCCA ACGACCCCAC TGCAGATGTC GAAATCAATC CCGACTTGAA ACCCATCATA 
GGCAGAGATT CAGGCCGAGC ACTGGCTGAA GTTCTCAAGA AACCCGTTGC CAGCCGACAA 
ATCCAAGGAA TGACAAATGG ACGGACCAGT TCCAGAGGAC AGCTGCTGAA GGAATTTCAG 
CTAAAGCCGA TCGGGAAAAA GATGAGCTCA GCCGTCGGGT TTGTTCCGGA CACCGGCCCT 
GCATCACGCA GTGTAATCCG CTCCATTATA AAATCCAGCC GGCTAGAGGA GGATCGGAAG 
CGTTACCTGA TGACTCTCCT TGATGACATC AAAGGAGCCA ACGATCTTGC CAAGTTCCAC 
CAGATGCTGA TGAAGATAAT AATGAAGTAG CTACAGCTCA ACTTACCTGC CAACCCCATG 
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CCAGTCGACC TAGCTAATAC AACCTAAATC CATTATAAAA AACTTAGGAG CAAAGTGATT 
GCCTCCCAAG TTCCACAATG ACAGAGATCT ACGACTTCGA CAAGTCGGCA TGGGACATCA 
AAGGGTCGAT CGCTCCGATA CAACCCACCA CCTACAGTGA TGGCAGGCTG GTGCCCCAGG 
TCAGAGTCAT AGATCCTGGT CTAGGCGACA GAAAAGATGA ATGTTTTATG TACATGTTTC 
TGCTGGGGGT TGTTGAGGAC AGCGATCTCC TAGGGCCTCC AATCGGGCGA GCATTTGGGT 
CTCTGCCCTT AGGTGTTGGC AGATCCACAG CAAAACCCGA AGAACTCCTC AAAGAGGCCA 
CTGAGCTTGA CATAGTTGTT AGACGTACAG CAGGGCTCAA TGAAAAACTG GTGTTCTACA 
ACAACACCCC ACTAACTCTC CTCATACCTT GGAGAAAGGT CCTAACAACA GGGAGTGTCT 
TCAACGCAAA CCAAGTGTGC AATGCGGTTA ATCTGATACC GCTGGATACC CCGCAGAGGT 
TCCGTGTTGT TTATATGAGC ATCACCCGTC TTTCAGATAA CGGGTATTAC ACCGTTCCTA 
GAAGAATGCT GGAATTCAGA TCGGTCAATG CAGTGGCCTT CAACCTGCTG GTGACCCTTA 
GGATTGACAA GGCGATTGGC CATGGGAAGA TCATCGACAA TGCAGAGCAA CTTCCTGAGG 
CAACATTTAT GGTCCACATC GGGAACTTCA GGAGAAAGAA AAGTGAAGTC TACTCTGCCG 
ATTATTGCAA AATGAAAATC GAAAAGATGG GCCTGGTTTT TGCACTTGGT GGGATAGGGG 
GCACCAGTCT TCACATTAGA AGCACAGGCA AAATGAGCAA GACTCTCCAT GCACAACTCG 
GGTTCAAGAA GACCCTATGT TACCCACTGA TGGATATCAA TGAAGACCTT AATCGATTAC 
TCTGGAGGAG CAGATGCAAG ATAGTAAGAA TCCAGGCAGT TTTGCAGCCA TCAGTTCCTC 
AAGAATTCCG CATTTACGAC GACGTTATCA TAAATGATGA CCAAGGATTA TTCAAAGTTC 
TGTAGACCGT AGTGCCCAGC AATGCCCGAA GACGACCCTC CTCACAATGA CAGCCAGAAG 
GCCCGGAAAA AAAGGCCCCC TCCGAAAGAC TCCACAGACC AAATGAGAGG CCAGCCAGCA 
GCTGACGGCA AGCACGAACA CCAGGCGGCC CCAGCACAGA ACAGCCCTGA CATAAGGCCA 
CCACCAGCCA TCCCAATCTG CATCCTCCTC GTAGGACCCC CGAGGACCAA CCCCCAAGGT 
TGCCCCCCAC CCAAACCACC AACCGCATCC CTACCACCCC CGGGAAAGAA ACCCCCAGCA 
ACTGGAAGAG CCCTTCCCCT TTCCCTCAAC ACAAGAACTC CACAACCGAA CCACACAAGC 
GACCGAGGTG ACCCAACCGC AGGCACCCGA CTCCCTAGAC AGATCCTCTC CCCCTGGCAA 
ACTAAACAAA ACTTAGGGCC AAGGAACATA CACACCCAAC AGAACCCAGA CCCCGGCCCA 
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CGGCGCCGCG CCCCCAACCC CCGACAACCA GAGGGAGCCC CCAACCAATC CCGCCGGCTC 
CCCCGGTGCC CACAGGCAGG CACACCAACC CCCGAACAGA CCCAGCACCC AGCCATCGAC 
AATCCAAGAC GGGGGGGCCC CCCCAAAAAA AGGCCCCCAG GGGCCGACAG CCAGCACCGC 
GAGGAAGCCC ACCCACCCCA CACACGACCA CGACAACCAA ACCAGAACCC AGACCACCCT 
GGGCCACCAG TTCCCAGACT CGGCCATCAC CCCGCAGAAA GGAAAGGCCA CAACCTGCGC 
ACCCCAGCCC CGATCCGGCG GGCAGCCACC CAACCCTAAC CAGCACCCAA GAGCGATCCC 
CGAAGGACCC CCGAACCGCA AAGGACATCA GTATCCCACA GCCTCTCCAA GTCCCCCGGT 
CTCTTCCTCT TCTCGAAGGG ACTAAAAGAT CAATCCACCA CATCCGACGA CACTCAACTC 
CCCGTCCCTA AAGGAGACAC CGGGAATCCC GGAATTAAGA CTCATCCAAT GTCCATCATG 
GGTCTCAAGG TGAACGTCTC TGCCATATTC ATGGCAGTAC TGTTAACTCT CCAAACACCC 
ACCGGTCAAA TCCATTGGGG CAATCTCTCT AAGATAGGGG TGGTAGGAAT AGGAAGTGCA 
AGCTACAAAG TTATGACTCG TTCCAGCCAT CAATCATTAG TCATAAAATT AATGCCCAAT 
ATAACTCTCC TCAATAACTG CACGAGGGTA GAGATTGCAG AATACAGGAG ACTACTGAGA 
ACAGTTTTGG AACCAATTAG AGATGCACTT AATGCAATGA CCCAGAATAT AAGACCGTTT 
CAGAGTGTAG CTTCAAGTAG GAGACACAAG AGATTTGCAG GAGTAGTCCT GGCAGGTGCG 
GCCCTAGGCG TTGCCACAGC TGCTCAGATA ACAGCCGGCA TTGCACTTCA CCAGTCCATG 
CTGAACTCTC AAGCCATCGA CAATCTGAGG GCAAGTCTGG AAACTACTAA TCAGGCAATT 
GAGGCAATCA GACAAGCAGG GCAGGAGATG ATATTGGCTG TTCAGGGTGT CCAAGACTAC 
ATCAATAATG AGCTGATACC GTCTATGAAC CAACTATCTT GTGATTTAAT CGGCCAGAAG 
CTCGGGCTCA AATTGCTCAG ATACTATACA GAAATCCTGT CATTATTTGG CCCTAGCTTA 
CGGGACCCCA TATCTGCGGA GATATCTATC CAGGCTTTGA GCTATGCGCT CGGAGGAGAT 
ATCAATAAGG TGTTAGAAAA GCTCGGATAT AGTGGAGGTG ATTTACTGGG CATCTTAGAG 
AGCAGAGGAA TAAAGGCCCG GATAACTCAC GTCGACACAG AGTCCTACTT CATTGTCCTC 
AGTATAGCCT ACCCGACGCT GTCCGAGATC AAGGGGGTGA TTGTCCACCG GCTAGAGGGG 
GTCTCGTACA ACATAGGCTC TCAAGAGTGG TATACGACTG TGCCCAAGTA TGTTGCAACC 
CAAGGGTACC TTATCTCGAA TTTTGATGAG TCATCGTGTA CTTTCATGCC AGAGGGGACT 
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GTGTGCAGCC AAAATGCCTT GTACCCGATG AGTCCTCTGC TCCAAGAATG CCTCCGGGGG 
TCCACCAAGT CCTGTGCTCG TACACTCGTA TCTGGGTCTT TTGGGAACCG GTTCATTTTG 
TCACAAGGGA ACCTAATAGC CAATTGTGCA TCAATCCTTT GCAAGTGTTA CACAACAGGA 
ACGATCATTA ATCAAGACCC TGACAAGATC CTAACATACA TTGCTGCCGA TCACTGCCCG 
GTAGTCGAGG TGAACGGCGT GACCATCCAA GTCGGGAGCA GGAGGTATCC AGACGCTGTG 
TACTTGCACA GAAT TGACCT CGGTCCTCCC ATATCATTGG AGAGGTTGGA CGTAGGGACA 
AATCTGGGGA ATGCAATTGC TAAGTTGGAG GATGCCAAGG AATTGTTGGA GTCATCGGAC 
CAGATATTGA GGAGTATGAA AGGTTTGTCG AGCACTAGCA TAGTCTACAT CCTGATTGCA 
GTGTGTCTTG GAGGGTTGAT AGGGATCCCC GCTTTAATAT GTTGCTGCAG GGGGCGTTGT 
AACAAAAAGG GAGAACAAGT TGGTATGTCA AGACCAGGCC TAAAGCCTGA TCTTACAGGA 
AGATCGAAAT CCTATGTAAG GTCGCTCTGA TCCTCTACAA CTCTTGGAAC ACAAATGTCC 
CACAAGTCTC CTCTTCGTCA TCAAGCAACC ACCGCATCCA GCATCAAGCC CACCTGAAAT 
TATCTCCGGC TCCCCTTTGG CCGAACAATA TCGGTAGTTA ATTAAAACTT AGGGTGCAAG 
ATCATCCACA ATGTCACCAC AACGAGACCG GATAAATGCC TTCTACAAAG ATAACCCCCA 
TCCCAAGGGA AGTAGGATAG TTATCAACAG AGAACACCTT ATGATTGATA GACCTTATGT 
TTTGCTGGCT GTTCTGTTCG TCATGTTTCT GAGCTTGATC GGGTTGCTAG CAATTGCAGG 
CATTAGACTT CATCGGGCAG CCATCTACAC CGCAGAGATC CATAAAAGCC TCAGCACCAA 
TCTAGATGTA ACTAACTCAA TTGAGCATCA GGTCAAGGAC GTGCTGACAC CACTCTTCAA 
AATCATCGGT GATGAAGTGG GCCTGAGGAC ACCTCAGAGA TTCACTGACC TAGTGAAATT 
CATCTCTGAC AAGATTAAAT TCCTTAACCC GGATAGGGAG TACGACTTCA GAGATCTCAC 
TTGGTGTATC AACCCGCCAG AGAGAATCAA ATTGGATTAT GATCAATACT GTGCAGATGT 
GGCTGCTGAA GAGCTCATGA ATGCATTGGT GAACTCAACT CTACTGGAGA CCAGAACAAC 
CAATCAGTTC CTAGCTGTCT CAAAGGGAAA CTGCTCAGGG CCCACTACAA TCAGAGGTCA 
ATTCTCAAAC ATGTCGCTGT CCCTGTTGGA CTTGTATTTA AGTCGAGGTT ACAATGTGTC 
ATCTATAGTC ACTATGACAT CCCAGGGAAT GTACGGGGGA ACTTACCTAG TGGAAAAGCC 
TAATCTGAGC AGCAAAGGGT CAGAGTTGTC ACAACTGAGC ATGTACCGAG TGTTTGAAGT 
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AGGTGTTATC AGAAATCCGG GTTTGGGGGC TCCGGTGTTC CATATGACAA ACTATTTTGA 
GCAACCAGTC AGTAATGATC TCAGCAACTG TATGGTGGCT TTGGGGGAGC TCAAACTCGC 
AGCCCTTTGT CACGGGGGAG ATTCTATCAC AATTCCCTAT CAGGGATCAG GGAAAGGTGT 
CAGCTTTCAG CTCGTCAAGC TAGGTGTCTG GAAATCCCCA ACCGACATGC AATCCTGGGT 
CCCCTTCTCA ACGGATGACC CAGTGATAGA CAGGCTTTAC CTCTCATCTC ACAGAGGTGT 
TATCGCTGAC AATCAAGCAA AATGGGCTAT CCCGACAACA AGAACAGATG ACAAGTTGCG 
AATGGAGACA TGCTTCCAGC AGGCGTGTAA GGGTAAAATC CAAGCACTCT GCGAGAATCC 
CGAGTGGGCA CCATTGAAGG ATAACAGGAT TCCTTCATAC GGAGTCTTGT CTGTTGATCT 
GAGTCTAACA GTTGAGCTTA AAATCAAAAT TGCTTCGGGA TTCGGGCCAT TGATCACACA 
CGGTTCAGGG ATGGACCTAT ACAAGTCCAA CCACAACAAT GAGTATTGGC TGACTATCCC 
GCCAATGAAG AACCTAGCCC TAGGTGTAAT CAACACATTG GAGTGGATAC CGAGATTCAA 
GGTTAGTCCC AACCTCTTCA CTGTCCCAAT TAAGGAAGCA GGCGAAGACT GCCATGCCCC 
AACATACCTA CCTGCGGAGG TGGATGGTGA TGTCAAACTC AGTTCCAATC TGGTGATCCT 
ACCTGGTCAA GATCTCCAAT ATGTTTTGGC AACCTACGAT ACTTCCAGGG TTGAACATGC 
TGTGGTTTAT TACGTTTACA GCCCAAGCCG CTCATTTTCT TACTTTTATC CTTTTAGGTT 
GCCTATAAAG GGGATCCCCA TCGAATTACA AGTGGAATGC TTCACATGGG ACCAAAAACT 
CTGGTGCCGT CACTTCTGTG TGCTTGCGGA CTCAGAATCT GGTGGACATA TCACTCACTC 
TGGGATGGTG GGCATGGGAG TCAGCTGCAC AGT CACCCGG GAAGATGGAA CCAATAGCAG 
ATAGGGCTGC CAGTGAACCA ATCACATGAT GTCACCCAGA CATCAGGCAT ACCCACTAGT 
GTGAAATAGA CATCAGAATT AAGAAAAACG TAGGGTCCAA GTGGTTCCCC GTTATGGACT 
CGCTATCTGT CAACCAGATC TTATACCCCG AAGTTCACCT AGATAGCCCG ATAGTTACCA 
ACAAGATAGT AGCCATCCTG GAGTATGCTC GAGTCCCTCA CGCTTACAGC CTGGAGGACC 
CTACACTGTG TCAGAACATC AAGCACCGCC TAAAAAACGG ATTTTCCAAC CAAATGATTA 
TAAACAATGT GGAAGTTGGG AATGTCATCA AGTCCAAGCT TAGGAGTTAT CCGGCCCACT 
CTCATATTCC ATATCCAAAC TGTAATCAGG ATTTATTTAA CATAGAAGAC AAAGAGTCAA 
CGAGGAAGAT CCGTGAACTC CTCAAAAAGG GAAATTCGCT GTACTCTAAA GTCAGTAATA 
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AGGTTTTCCA ATGCTTGAGG GACACTAATT CACGGCTTGG TCTAGGCTCC GAATTGAGGG 
AGGACATCAA GGAGAAAGTT ATTAACTTGG GAGTTTACAT GCACAGCTCC CAATGGTTTG 
AGCCCTTTCT GTTTTGGTTT ACAGTCAAGA CTGAGATGAG GTCAGTGATT AAATCACAAA 
CCCATACTTG CCATAGGAGG AGACACACAC CTGTATTCTT CACTGGTAGT TCAGTTGAGT 
TGCTAATCTC TCGTGACCTT GTTGCTATAA TCAGTAAAGA GTCTCAACAT GTATATTACC 
TAACATTTGA GCTGGTTTTG ATGTATTGTG ATGTCATAGA GGGGAGGTTA ATGACAGAGA 
CTGCTATGAC CATTGATGCT AGATATACAG AGCTTCTAGG AAGAGTCAGA TACATGTGGA 
AATTGATAGA TGGTTTCTTC CCTGCACTCG GGAATCCAAC TTATCAAATT GTAGCCATGC 
TGGAGCCTCT TTCACTTGCT TACCTGCAGC TGAGGGATAT AACGGTAGAA CTCAGAGGTG 
CTTTCCTTAA CCACTGCTTT ACTGAAATAC ATGATGTTCT TGACCAAAAC GGGTTTTCTG 
ATGAAGGTAC TTATCACGAG TTAGTTGAAG CTCTAGATTA CATTTTCATA ACTGATGACA 
TACACCTGAC AGGGGAGATT TTCTCATTTT TCAGAAGTTT CGGCCACCCC AGACTTGAAG 
CAGTAACGGC TGCTGAAAAT GTTAGGAAAT ACATGAATCA GCCTAAAGTC ATTGTGTATG 
AGACTCTGAT GAAAGGTCAT GCCATATTTT GTGGAATCAT AATCAACGGC TATCGTGACA 
GGCACGGAGG CAGTTGGCCA CCGCTGACCC TCCCCCTGCA TGCTGCAGAC ACAATCCGGA 
ATGCTCAAGC CTCAGGTGAA GGATTAACAC ATGAGCAGTG CGTTGATAAC TGGAAATCTT 
TTGCTGGAGT GAAATTTGGC TGCTTCATGC CTCTTAGCCT GGATAGTGAT CTGACAATGT 
ACCTAAAGGA CAAGGCACTT GCTGCTCTCC AAAGGGAATG GGATTCAGTT TACCCGAAAG 
AGTTCCTGCG TTACGACCCC CCCAAGGGAA CCGGGTCACG GAGGCTTGTA GATGTTTTCC 
TTAATGATTC GAGCTTTGAC CCATATGATA TGATAATGTA TGTTGTAAGT GGAGCTTACC 
TCCATGACCC TGAGTTCAAC CTGTCTTACA GCCTGAAAGA AAAGGAGATC AAGGAAACAG 
GTAGACTTTT TGCTAAAATG ACTTACAAAA TGAGGGCATG CCAAGTGATT GCTGAAAATC 
TAATCTCAAA CGGGATTGGC AAAT AT TTT A AGGACAATGG GATGGCCAAG GATGAGCACG 
ATTTGACTAA GGCACTCCAC ACTCTGGCTG TCTCAGGAGT CCCTAAAGAT CTCAAAGAAA 
GTCACAGAGG GGGGCCAGTC CTAAAAACCT ACTCCCGAAG CCCAGCCCAC A C AAAT AC C A 
GGAACGTGAG GGCAGCAAAA GGGTTTATAG GGTTCCCTCA GATAATTCGG CAGGACCAAG 
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ACACTAATCA TCCGGAGAAT ATGGAAGCTT ACGAGACAGT CAGTGCATTT ATCACAACTG 
ATCTCAAGAA GTACTGCCTT AATTGGAGAT ATGAGACCAT CAGCTTGTTT GCACAGAGGC 
TAAATGAGAT TTACGGATTA CCCTCATTTT TTCAGTGGCT GCATAAGAGG CTTGAGACCT 
CTGTCCTGTA TGTAAGTGAC CCTCATTGCC CCCCCGACCT TGACGCCCAT ATCCCGTTAT 
GCAAAGTCCC CAATGACCAA ATCTTCATTA AGTACCCTAT GGGAGGTATA GAAGGGTATT 
GTCAGAAGCT GTGGACCATC AGCACCATTC CCTATTTATA CCTGGCTGCT TATGAGAGCG 
GAGTAAGGAT TGCTTCATTA GTGCAAGGGG ACAATCAGAC CATAGCTGTA ACAAAAAGGG 
TACCCAGCAC ATGGCCTTAC AACCTTAAGA AATGGGAAGC TGCTAGAGTA ACTAGAGATT 
ACTTTGTAAT TCTTAGGCAA AGGCTACATG ACATTGGCCA TCACCTCAAG GCAAATGAGA 
CAATTGTTTC ATCACATTTT TTTGTTTATT CAAAAGGAAT ATATTATGAT GGGCTACTTG 
TGTCCCAATC ACTCAAGAGC ATCGCAAGAT GTGTATTCTG GTCAGAGACT ATAGTTGATG 
AAACAAGGGC AGCATGCAGT AATATTGCTA CAACAATGGC TAAAAGCATC GAGAGAGGTT 
ATGACCGTTA CCTTGCATAT TCCCTGAACG TCCTAAAAGT GATACAGCAG ATT CTGATCT 
CTCTTGGCTT CACAATCAAT TCAACCATGA CCCAGGATGT AGTCATACCC CTCCTCACAA 
ACAACGACCT CTTAATAAGG ATGGCACTGT TGCCCGCTCC TATTGGGGGG ATGAATTATC 
TGAATATGAG CAGGCTGTTT GTCAGAAACA TCGGTGATCC AGTAACATCA TCAATTGCTG 
ATCTCAAGAG AATGATTCTC GCATCACTGA TGCCTGAAGA GACCCTCCAT CAAGTAATGA 
CACAGCAACC GGGGGACTCT TCATTCCTAG ACTGGGCTAG CGACCCTTAC TCAGCAAATC 
TTGTATGTGT CCAGAGCATC ACTAGACTCC TCAAGAACAT AACTGCAAGG TTTGTCCTAA 
TCCACAGTCC AAACCCAATG TTAAAGGGAT TATTCCATGA TGACAGTAAA GAAGAGGACG 
AGGGACTGGC AGCATTCCTC ATGGACAGGC ATATTATAGT ACCTAGGGCA GCTCATGAAA 
TCCTGGATCA TAGTGTCACA GGGGCAAGAG AGTCTATTGC AGGCATGCTA GATACCACAA 
AAGGCCTGAT TCGAGCCAGC ATGAGGAAGG GGGGGTTAAC CTCTCGAGTG ATAACCAGAT 
TGTCCAATTA TGACTATGAA CAATTCAGAG CAGGGATGGT GCTATTAACA GGAAGAAAGA 
GAAATGTCCT CATTGACAAA GAGTCATGTT CAGTGCAGCT GGCGAGAGCC CTAAGAAGCC 
ATATGTGGGC GAGGCTAGCT CGAGGACGGC CTATTTACGG CCTTGAGGTC CCTGATGTAC 
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TAGAATCTAT GCGAGGCCAC CTTATTCGGC GTCATGAGAC ATGTGTCATC TGCGAGTGTG 
GATCAGTCAA CTACGGATGG TTTTTTGTCC CCTCGGGTTG CCAACTGGAT GATATTGACA 
AGGAAACATC ATCCTTGAGA GTCC CATATA TTGGTTCTAC CACTGATGAG AGAACAGACA 
TGAAGCTTGC CTTCGTAAGA GCCCCAAGTC GATCCTTGCG ATCTGCTGTT AGAATAGCAA 
CAGTGTACTC ATGGGCTTAC GGTGATGATG ATAGCTCTTG GAACGAAGCC TGGTTGTTGG 
CAAGGCAAAG GGCTAATGTG AGCCTGGAGG AGCTAAGGGT GATCACTCCC ATCTCAACTT 
CGACTAATTT AGCACATAGG TTGAGGGATC GTAGCACTCA AGTGAAATAC TCAGGTACAT 
CCCTTGTCCG AGTGGCAAGG TATACCACAA TCTCCAACGA CAATCTCTCA TTTGTCATAT 
CAGATAAGAA GGTTGATACT AACTTTATAT ACCAACAAGG AATGCTCCTA GGGTTGGGCG 
TTTTAGAAAC ATTGTTTCGA CTCGAGAAAG ATACCGGATC ATCTAACACG GTATTACATC 
TTCACGTCGA AACAGATTGT TGCGTGATCC CAATGATAGA TCATCCCAGG ATACCCAGCT 
CTCGCAAGCT AGAGCTGAGG GCAGAGCTGT GTACCAACCC ATTGATATAT GATAATGCAC 
CTTTAATTGA CAGAGATGCA ACAAGGCTAT ACACCCAGAG CCATAGGAGG CACCTTGTAG 
AATTTGTTAC ATGGTCCACA CCCCAACTAT ATCACATTCT AGCTAAGTCC ACAGCACTAT 
CTATGATTGA CCTGGTAACA AAATTTGAGA AGGACCATAT GAATGAAATT TCAGCTCTCA 
TAGGGGATGA CGATATCAAT AGTTTCATAA CTGAGTTTCT GCTTATAGAG CCAAGATTAT 
TCACTATCTA CTTGGGCCAG TGTGCGGCCA TCAATTGGGC ATTTGATGTA CATTATCATA 
GACCATCAGG GAAATATCAG ATGGGTGAGC TGTTGTCATC GTTCCTTTCT AGAATGAGCA 
AAGGAGTGTT TAAGGTGCTT GTCAATGCTC TAAGCCACCC AAAGATCTAC AAGAAATTCT 
GGCACTGTGG TATTATAGAG CCTATCCATG GTCCTTCACT TGATGCTCAA AACTTGCACA 
CAACTGTGTG CAACATGGTT TACACATGCT ATATGACCTA CCTCGACCTG TTGTTGAATG 
AAGAGTTAGA AGAGTTTACA TTTCTTTTGT GTGAAAGTGA CGAGGATGTA GTACCGGACA 
GATTCGACAA CATCCAGGCA AAACACTTGT GTGTTCTGGC AGATTTGTAC TGTCAACCAG 
GGACCTGCCC ACCAATTCGA GGTCTAAGAC CGGTAGAGAA ATGTGCAGTT CTAACCGACC 
ATATCAAGGC GGAGGCTAGG TTATCTCCAG CAGGATCTTC GTGGAACATA AATCCAATTA 
TTGTAGACCA TTACTCATGC TCTCTGACTT ATCTTCGGCG AGGATCGATC AAACAGATAA 
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GATTGA6A6T TGATCCAGGA TTCATTTTCG ACGCCCTCGC TGAGGTAAAT GTCAGTCAGC 
CAAAGATCGG CAGCAACAAC ATCTCAAATA TGAGCATCAA GGATTTCAGA CCCCCACACG 
ATGATGTTGC AAAATTGCTC AAAGATATCA ATACAAGCAA GCACAATCTT CCCATTTCTG 
GGGGCAATCT CGCCAATTAT GAAATCCATG CTTTCCGCAG AATCGGGTTG AACTCATCTG 
CTTGCTACAA AGCTGTTGAG ATATCAACAT TAATTAGGAG ATGCCTTGAG CCAGGGGAAG 
ACGGCTTATT CTTGGGTGAG GGATCGGGTT CTATGTTGAT CACTTATAAG GAGATACTTA 
AACTAAACAA GTGCTTCTAT AATAGTGGGG TCTCTGCCAA TTCTAGATCT GGTCAAAGGG 
AATTAGCACC CTATCCCTCC GAAGTTGGCC TTGTCGAACA CAGAATGGGA GTAGGTAATA 
TTGTCAAGGT GCTCTTTAAC GGGAGGCCCG AAGTCACATG GGTAGGCAGT GTAGATTGCT 
TCAATTACAT AGTTAGTAAT ATCCCTACCT CTAGTGTGGG GTTTATCCAT TCAGATATAG 
AGACCTTACC TAACAAAGAT ACTATAGAGA AGCTAGAGGA ATTGGCAGCC ATCTTATCGA 
TGGCTCTGCT CCTGGGCAAA ATAGGATCAA TACTGGTGAT TAAGCTTATG CCTTTCAGCG 
GGGATTTTGT TCAGGGATTT ATAAGTTATG TAGGGTCTCA TTATAGAGAA GTGAACCTTG 
TATACCCCAG ATACAGCAAC TTCATATCTA CTGAATCTTA TTTGGTTATG ACAGATCTCA 
AGGCTAACCG GCTAATGAAT CCTGAAAAGA TTAAGCAGCA GATAATTGAA TCATCTGTGC 
GGACTTCACC TGGACTTATA GGTCACATCC TATC CATT AA GCAACTAAGC TGCATACAAG 
CAATTGTGGG AGACG CAGTT AGTAGAGGTG ATATCAATCC TACTCTGAAA AAACTTACAC 
CTATAGAGCA GGTGCTGATC AATTGCGGGT TGGCAATTAA CGGACCTAAA CTGTGCAAAG 
AATTGATCCA CCATGATGTT GCCTCAGGGC AAGATGGATT GCTTAATTCT ATACTCATCC 
TCTACAGGGA GTTGGCAAGA TTCAAGGACA ACCAAAGAAG TCAACAAGGG ATGTTCCACG 
CTTACCCCGT ATTGGTAAGT AGCAGGCAAC GAGAACTTAT ATCTAGAATC ACTCGCAAAT 
TTTGGGGGCA CATTCTTCTT TACTCCGGGA ACAGAAAGTT GATAAATAAG TTTATCCAGA 
ATCTCAAGTC CGGTTATCTG ATACTAGACT TACACCAGAA TATCTTCGTT AAGAATCTAT 
CCAAGTCAGA GAAACAGATT ATTATGACGG GGGGTTTGAA ACGTGAGTGG GTTTTTAAGG 
TAACAGTCAA GGAGACCAAG GAATGGTATA AGTTAGTCGG ATACAGTGCC CTGATTAAGG 
ACTAATTGGT TGAACTCCGG AACCCTAATC CTGCCCCAGG TGGTTAGGCA TTATTTGTAA 
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TATATTAAAG AAAACTTTGA AAATACGAAG TTTCTATTCC CAGCTTTGTC TGGT 15894 
(2) INFORMATION FOR SBQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2183 amino acids 

(B) TYPE: amino acid 

(C) STRAND EDNESS : 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 

Met Asp Ser Leu Ser Val Asn Gin He Leu Tyr Pro Glu Val His Leu 
15 10 15 

Asp Ser Pro He Val Thr Asn Lys He Val Ala He Leu Glu Tyr Ala 
20 25 30 

Arg Val Pro His Ala Tyr Ser Leu Glu Asp Pro Thr Leu Cys Gin Asn 
35 40 45 

He Lys His Arg Leu Lys Asn Gly Phe Ser Asn Gin Met He He Asn 
50 55 60 

Asn Val Glu Val Gly Asn Val He Lys Ser Lys Leu Arg Ser Tyr Pro 
65 70 75 80 

Ala His Ser His He Pro Tyr Pro Asn Cys Asn Gin Asp Leu Phe Asn 
85 90 95 

He Glu Asp Lys Glu Ser Thr Arg Lys He Arg Glu Leu Leu Lys Lys 
100 105 110 

Gly Asn Ser Leu Tyr Ser Lys Val Ser Asn Lys Val Phe Gin Cys Leu 
115 120 125 

Arg Asp Thr Asn Ser Arg Leu Gly Leu Gly Ser Glu Leu Arg Glu Asp 
130 135 140 

He Lys Glu Lys Val He Asn Leu Gly Val Tyr Met His Ser Ser Gin 
145 150 155 160 

Trp Phe Glu Pro Phe Leu Phe Trp Phe Thr Val Lys Thr Glu Met Arg 
165 170 175 

Ser Val He Lys Ser Gin Thr His Thr Cys His Arg Arg Arg His Thr 
180 185 190 
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Pro Val Phe Phe Thr Gly Ser Ser Val Glu Leu Leu lie Ser Arg Asp 
195 200 205 

Leu Val Ala lie lie Ser Lys Glu Ser Gin His Val Tyr Tyr Leu Thr 
210 215 220 

Phe Glu Leu Val Leu Met Tyr Cys Asp Val lie Glu Gly Arg Leu Met 
225 230 235 240 

Thr Glu Thr Ala Met Thr He Asp Ala Arg Tyr Thr Glu Leu Leu Gly 
245 250 255 

Arg Val Arg Tyr Met Trp Lys Leu He Asp Gly Phe Phe Pro Ala Leu 
260 265 270 

Gly Asn Pro Thr Tyr Gin He Val Ala Met Leu Glu Pro Leu Ser Leu 
275 280 285 

Ala Tyr Leu Gin Leu Arg Asp He Thr Val Glu Leu Arg Gly Ala Phe 
290 295 300 

Leu Asn His Cys Phe Thr Glu He His Asp Val Leu Asp Gin Asn Gly 
305 310 315 320 

Phe Ser Asp Glu Gly Thr Tyr His Glu Leu Val Glu Ala Leu Asp Tyr 
325 330 335 

He Phe He Thr Asp Asp He His Leu Thr Gly Glu He Phe Ser Phe 
340 345 350 

Phe Arg Ser Phe Gly His Pro Arg Leu Glu Ala Val Thr Ala Ala Glu 
355 360 365 

Asn Val Arg Lys Tyr Met Asn Gin Pro Lys Val He Val Tyr Glu Thr 
370 375 380 

Leu Met Lys Gly His Ala He Phe Cys Gly He He He Asn Gly Tyr 
385 390 395 400 

Arg Asp Arg His Gly Gly Ser Trp Pro Pro Leu Thr Leu Pro Leu His 
405 410 415 

Ala Ala Asp Thr He Arg Asn Ala Gin Ala Ser Gly Glu Gly Leu Thr 
420 425 430 

His Glu Gin Cys Val Asp Asn Trp Lys Ser Phe Ala Gly Val Lys Phe 
435 440 445 

Gly Cys Phe Met Pro Leu Ser Leu Asp Ser Asp Leu Thr Met Tyr Leu 
450 455 460 
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Lys Asp Lys Ala Leu Ala Ala Leu Gin Arg Glu Trp Asp Ser Val Tyr 
465 470 475 480 

Pro Lys Glu Phe Leu Arg Tyr Asp Pro Pro Lys Gly Thr Gly Ser Arg 
485 490 495 

Arg Leu Val Asp Val Phe Leu Asn Asp Ser Ser Phe Asp Pro Tyr Asp 
500 505 510 

Met lie Met Tyr Val Val Ser Gly Ala Tyr Leu His Asp Pro Glu Phe 
515 520 525 

Asn Leu Ser Tyr Ser Leu Lys Glu Lys Glu He Lys Glu Thr Gly Arg 
530 535 540 

Leu Phe Ala Lys Met Thr Tyr Lys Met Arg Ala Cys Gin Val He Ala 
545 550 555 560 

Glu Asn Leu He Ser Asn Gly He Gly Lys Tyr Phe Lys Asp Asn Gly 
565 570 575 

Met Ala Lys Asp Glu His Asp Leu Thr Lys Ala Leu His Thr Leu Ala 
580 585 590 

Val Ser Gly Val Pro Lys Asp Leu Lys Glu Ser His Arg Gly Gly Pro 
595 600 605 

Val Leu Lys Thr Tyr Ser Arg Ser Pro Ala His Thr Asn Thr Arg Asn 
610 615 620 

Val Arg Ala Ala Lys Gly Phe He Gly Phe Pro Gin He He Arg Gin 
625 630 635 640 

Asp Gin Asp Thr Asn His Pro Glu Asn Met Glu Ala Tyr Glu Thr Val 
645 650 655 

Ser Ala Phe He Thr Thr Asp Leu Lys Lys Tyr Cys Leu Asn Trp Arg 
660 665 670 

Tyr Glu Thr He Ser Leu Phe Ala Gin Arg Leu Asn Glu He Tyr Gly 
675 680 685 

Leu Pro Ser Phe Phe Gin Trp Leu His Lys Arg Leu Glu Thr Ser Val 
690 695 700 

Leu Tyr Val Ser Asp Pro His Cys Pro Pro Asp Leu Asp Ala His He 
705 710 715 720 

Pro Leu Cys Lys Val Pro Asn Asp Gin He Phe He Lys Tyr Pro Met 
725 730 735 

Gly Gly He Glu Gly Tyr Cys Gin Lys Leu Trp Thr He Ser Thr He 
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740 745 750 

Pro Tyr Leu Tyr Leu Ala Ala Tyr Glu Ser Gly Val Arg He Ala Ser 
755 760 765 

Leu Val Gin Gly Asp Asn Gin Thr He Ala Val Thr Lys Arg Val Pro 
770 775 780 

Ser Thr Trp Pro Tyr Aen Leu Lys Lys Trp Glu Ala Ala Arg Val Thr 
785 790 795 800 

Arg Asp Tyr Phe Val He Leu Arg Gin Arg Leu His Asp He Gly His 
805 810 815 

His Leu Lys Ala Asn Glu Thr He Val Ser Ser His Phe Phe Val Tyr 
820 825 830 

Ser Lys Gly He Tyr Tyr Asp Gly Leu Leu Val Ser Gin Ser Leu Lys 
835 840 845 

Ser He Ala Arg Cys Val Phe Trp Ser Glu Thr He Val Asp Glu Thr 
850 855 860 

Arg Ala Ala Cys Ser Asn He Ala Thr Thr Met Ala Lys Ser He Glu 
865 870 875 880 

Arg Gly Tyr Asp Arg Tyr Leu Ala Tyr Ser Leu Asn Val Leu Lys Val 
885 890 895 

He Gin Gin He Leu He Ser Leu Gly Phe Thr He Asn Ser Thr Met 
900 905 910 

Thr Gin Asp Val Val He Pro Leu Leu Thr Asn Asn Asp Leu Leu He 
915 920 925 

Arg Met Ala Leu Leu Pro Ala Pro He Gly Gly Met Asn Tyr Leu Asn 
930 935 940 

Met Ser Arg Leu Phe Val Arg Asn He Gly Asp Pro Val Thr Ser Ser 
945 950 955 960 

He Ala Asp Leu Lys Arg Met He Leu Ala Ser Leu Met Pro Glu Glu 
965 970 975 

Thr Leu His Gin Val Met Thr Gin Gin Pro Gly Asp Ser Ser Phe Leu 
980 985 990 

Asp Trp Ala Ser Asp Pro Tyr Ser Ala Asn Leu Val Cys Val Gin Ser 
995 1000 1005 

He Thr Arg Leu Leu Lys Asn He Thr Ala Arg Phe Val Leu He His 
1010 1015 1020 
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Ser Pro Asn Pro Met Leu Lys Gly Leu Phe Hie Asp Asp Ser Lys Glu 
1025 1030 1035 1040 

Glu Asp Glu Gly Leu Ala Ala Phe Leu Met Asp Arg His He He Val 
1045 1050 1055 

Pro Arg Ala Ala His Glu He Leu Asp His Ser Val Thr Gly Ala Arg 
1060 1065 1070 

Glu Ser He Ala Gly Met Leu Asp Thr Thr Lys Gly Leu He Arg Ala 
1075 1080 1085 

Ser Met Arg Lys Gly Gly Leu Thr Ser Arg Val He Thr Arg Leu Ser 
1090 1095 1100 

Asn Tyr Asp Tyr Glu Gin Phe Arg Ala Gly Met Val Leu Leu Thr Gly 
1105 1110 1115 1120 

Arg Lys Arg Asn Val Leu He Asp Lys Glu Ser Cys Ser Val Gin Leu 
1125 1130 1135 

Ala Arg Ala Leu Arg Ser His Met Trp Ala Arg Leu Ala Arg Gly Arg 
1140 1145 1150 

Pro He Tyr Gly Leu Glu Val Pro Asp Val Leu Glu Ser Met Arg Gly 
1155 1160 1165 

His Leu He Arg Arg His Glu Thr Cys Val He Cys Glu Cys Gly Ser 
1170 1175 1180 

Val Asn Tyr Gly Trp Phe Phe Val Pro Ser Gly Cys Gin Leu Asp Asp 
1185 1190 1195 1200 

He Asp Lys Glu Thr Ser Ser Leu Arg Val Pro Tyr He Gly Ser Thr 
1205 1210 1215 

Thr Asp Glu Arg Thr Asp Met Lys Leu Ala Phe Val Arg Ala Pro Ser 
1220 1225 1230 

Arg Ser Leu Arg Ser Ala Val Arg He Ala Thr Val Tyr Ser Trp Ala 
1235 1240 1245 

Tyr Gly Asp Asp Asp Ser Ser Trp Asn Glu Ala Trp Leu Leu Ala Arg 
1250 1255 1260 

Gin Arg Ala Asn Val Ser Leu Glu Glu Leu Arg Val He Thr Pro He 
1265 1270 1275 1280 

Ser Thr Ser Thr Asn Leu Ala His Arg Leu Arg Asp Arg Ser Thr Gin 
1285 1290 1295 
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Val Lys Tyr Ser Gly Thr Ser Leu Val Arg Val Ala Arg Tyr Thr Thr 
1300 1305 1310 

lie Ser Asn Asp Asn Leu Ser Phe Val lie Ser Asp Lys Lys Val Asp 
1315 1320 1325 

Thr Asn Phe lie Tyr Gin Gin Gly Met Leu Leu Gly Leu Gly Val Leu 
1330 1335 1340 

Glu Thr Leu Phe Arg Leu Glu Lys Asp Thr Gly Ser Ser Asn Thr Val 
1345 1350 1355 1360 

Leu His Leu His Val Glu Thr Asp Cys Cys Val He Pro Met He Asp 
1365 1370 1375 

His Pro Arg He Pro Ser Ser Arg Lys Leu Glu Leu Arg Ala Glu Leu 
1380 1385 1390 

Cys Thr Asn Pro Leu He Tyr Asp Asn Ala Pro Leu He Asp Arg Asp 
1395 1400 1405 

Ala Thr Arg Leu Tyr Thr Gin Ser His Arg Arg His Leu Val Glu Phe 
1410 1415 1420 

Val Thr Trp Ser Thr Pro Gin Leu Tyr His He Leu Ala Lys Ser Thr 
1425 1430 1435 1440 

Ala Leu Ser Met He Asp Leu Val Thr Lys Phe Glu Lys Asp His Met 
1445 1450 1455 

Asn Glu He Ser Ala Leu He Gly Asp Asp Asp He Asn Ser Phe He 
1460 1465 1470 

Thr Glu Phe Leu Leu He Glu Pro Arg Leu Phe Thr He Tyr Leu Gly 
1475 1480 1485 

Gin Cys Ala Ala He Asn Trp Ala Phe Asp Val His Tyr His Arg Pro 
1490 1495 1500 

Ser Gly Lys Tyr Gin Met Gly Glu Leu Leu Ser Ser Phe Leu Ser Arg 
1505 1510 1515 1520 

Met Ser Lys Gly Val Phe Lys Val Leu Val Asn Ala Leu Ser His Pro 
1525 1530 1535 

Lys He Tyr Lys Lys Phe Trp His Cys Gly He He Glu Pro He His 
1540 1545 1550 

Gly Pro Ser Leu Asp Ala Gin Asn Leu His Thr Thr Val Cys Asn Met 
1555 1560 1565 

Val Tyr Thr Cys Tyr Met Thr Tyr Leu Asp Leu Leu Leu Asn Glu Glu 
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1570 1575 X580 

Leu Glu Glu Phe Thr Phe Leu Leu Cys Glu Ser Asp Glu Asp Val Val 
1585 1590 1595 1600 

Pro Asp Arg Phe Asp Asn lie Gin Ala Lys His Leu Cys Val Leu Ala 
1605 1610 1615 

Asp Leu Tyr Cys Gin Pro Gly Thr Cys Pro Pro He Arg Gly Leu Arg 
1620 1625 1630 

Pro Val Glu Lys Cys Ala Val Leu Thr Asp His He Lys Ala Glu Ala 
1635 1640 1645 

Arg Leu Ser Pro Ala Gly Ser Ser Trp Asn He Asn Pro He He Val 
1650 1655 1660 

Asp His Tyr Ser Cys Ser Leu Thr Tyr Leu Arg Arg Gly Ser He Lys 
1665 1670 1675 1680 

Gin He Arg Leu Arg Val Asp Pro Gly Phe He Phe Asp Ala Leu Ala 
1685 1690 1695 

Glu Val Asn Val Ser Gin Pro Lys He Gly Ser Asn Asn He Ser Asn 
1700 1705 1710 

Met Ser He Lys Asp Phe Arg Pro Pro His Asp Asp Val Ala Lys Leu 
1715 1720 1725 

Leu Lys Asp He Asn Thr Ser Lys His Asn Leu Pro He Ser Gly Gly 
1730 1735 1740 

Asn Leu Ala Asn Tyr Glu He His Ala Phe Arg Arg He Gly Leu Asn 
1745 1750 1755 1760 

Ser Ser Ala Cys Tyr Lys Ala Val Glu He Ser Thr Leu He Arg Arg 
1765 1770 1775 

Cys Leu Glu Pro Gly Glu Asp Gly Leu Phe Leu Gly Glu Gly Ser Gly 
1780 1785 1790 

Ser Met Leu He Thr Tyr Lys Glu He Leu Lys Leu Asn Lys Cys Phe 
1795 1800 1805 

Tyr Asn Ser Gly Val Ser Ala Asn Ser Arg Ser Gly Gin Arg Glu Leu 
1810 1815 1820 

Ala Pro Tyr Pro Ser Glu Val Gly Leu Val Glu His Arg Met Gly Val 
1825 1830 1835 1840 

Gly Asn He Val Lys Val Leu Phe Asn Gly Arg Pro Glu Val Thr Trp 
1845 1850 1855 
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Val Gly Ser Val Asp Cys Phe Asn Tyr He Val Ser Asn He Pro Thr 
1860 1865 1870 

Ser Ser Val Gly Phe He His Ser Asp He Glu Thr Leu Pro Asn Lys 
1875 1880 1885 

Asp Thr He Glu Lys Leu Glu Glu Leu Ala Ala He Leu Ser Met Ala 
1890 1895 1900 

Leu Leu Leu Gly Lys He Gly Ser He Leu Val He Lys Leu Met Pro 
1905 1910 1915 1920 

Phe Ser Gly Asp Phe Val Gin Gly Phe He Ser Tyr Val Gly Ser His 
1925 - 1930 1935 

Tyr Arg Glu Val Asn Leu Val Tyr Pro Arg Tyr Ser Asn Phe He Ser 
1940 1945 1950 

Thr Glu Ser Tyr Leu Val Met Thr Asp Leu Lys Ala Asn Arg Leu Met 
1955 1960 1965 

Asn Pro Glu Lys He Lys Gin Gin He He Glu Ser Ser Val Arg Thr 
1970 1975 1980 

Ser Pro Gly Leu He Gly His He Leu Ser He Lys Gin Leu Ser Cys 
1985 1990 1995 2000 

He Gin Ala He Val Gly Asp Ala Val Ser Arg Gly Asp He Asn Pro 
2005 2010 2015 

Thr Leu Lys Lys Leu Thr Pro He Glu Gin Val Leu He Asn Cys Gly 
2020 2025 2030 

Leu Ala He Asn Gly Pro Lys Leu Cys Lys Glu Leu He His His Asp 
2035 2040 2045 

Val Ala Ser Gly Gin Asp Gly Leu Leu Asn Ser He Leu He Leu Tyr 
2050 2055 2060 

Arg Glu Leu Ala Arg Phe Lys Asp Asn Gin Arg Ser Gin Gin Gly Met 
2065 2070 2075 2080 

Phe His Ala Tyr Pro Val Leu Val Ser Ser Arg Gin Arg Glu Leu He 
2085 2090 2095 

Ser Arg He Thr Arg Lys Phe Trp Gly His He Leu Leu Tyr Ser Gly 
2100 2105 2110 

Asn Arg Lys Leu He Asn Lys Phe He Gin Asn Leu Lys Ser Gly Tyr 
2115 2120 2125 



SUBSTITUTE SHEET (RULE 26) 





WO 98/13501 



PCT/US97/16718 



- 124 - 



Leu He Leu Asp Leu His Gin Asn He Phe Val Lys Asn Leu Ser Lys 
2130 2135 2140 



Ser Glu Lys Gin He He Met Thr Gly Gly Leu Lys Arg Glu Trp Val 
2145 2150 2155 2160 



Phe Lys Val Thr Val Lys Glu Thr Lys Glu Trp Tyr Lys Leu Val Gly 
2165 2170 2175 



Tyr Ser Ala Leu He Lys Asp 
2180 



(2) INFORMATION FOR SEQ ID NO:5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15894 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

Ui) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 

ACCAAACAAA GTTGGGTAAG GATAGATCAA TCAATGATCA TATTCTAGTA CACTTAGGAT 60 

TCAAGATCCT ATTATCAGGG ACAAGAGCAG GATTAGGGAT ATCCGAGATG GCCACACTTT 120 

TAAGGAGCTT AGCATTGTTC AAAAGAAACA AGGACAAACC ACCCATTACA TCAGGATCCG 180 

GTGGAGCCAT CAGAGGAATC AAACACATTA TTATAGTACC AATCCCTGGA GATTCCTCAA 240 

TTACCACTCG ATCCAGACTA CTGGACCGGT TGGTCAGGTT AATTGGAAAC CCGGATGTGA 300 

GCGGGCCCAA ACTAACAGGG GCACTAATAG GTATATTATC CTTGTTTGTG GAGTCTCCAG 360 

GTCAATTGAT TCAGAGGATC ACCGATGACC CTGACGTTAG CATCAGGCTG TTAGAGGTTG 420 

TCCAGAGTGA CCAGTCACAA TCTGGCCTTA CCTTCGCATC AAGAGGTACC AACATGGAGG 480 

ATGAGGCGGA CCAATACTTT TCACATGATG ATCCAAGTAG TAGTGATCAA TCCAGGTCCG 540 

GATGGTTCGA GAACAAGGAA ATCTCAGATA TTGAAGTGCA AGACCCTGAG GGATTCAACA 600 

TGATTCTGGG TACCATTCTA GCC CAAATTT GGGTCTTGCT CGCGAAGGCG GTTACGGCCC 660 

CAGACACGGC AGCTGATTCG GAGCTAAGAA GGTGGATAAA GTACACCCAA CAAAGAAGGG 720 

TAGTTGGTGA ATTCAGATTG GAGAGAAAAT GGTTGGATGT GGTGAGGAAC AGGATTGCCG 780 
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AGGACCTCTC CTTACGCCGA TTCATGGTCG CTCTAATCCT GGATATCAAG AGGACACCCG 
GGAACAAACC AAGGATTGCT GAAATGATAT GTGACATTGA TACATATATC GTAGAGGCAG 
GATTAGCCAG TTTTATCCTA ACTATTAAGT TTGGGATAGA AACTATGTAT CCTGCTCTTG 
GACTGCATGA ATTTGCTGGT GAGTTATCCA CACTTGAGTC CTTGATGAAT CTTTACCAGC 
AAATGGGAGA AACTGCACCC TACATGGTAA TCCTGGAGAA CTCAATTCAG AACAAGTTCA 
GTGCAGGATC ATACCCCCTG CTCTGGAGCT ATGCCATGGG AGTAGGGGTG GAACTTGAAA 
ACTCCATGGG AGGTTTGAAC TTTGGTCGAT CTTACTTTGA TCCAGCATAT TTTAGATTAG 
GGCAAGAGAT GGTGAGGAGG TCAGCTGGGA AAGTCAGTTC CACATTAGCA TCTGAACTCG 
GTATCACTGC TGAGGATGCA AGGCTTGTTT CAGAGATTGC AATGCACACT ACTGAGGACA 
GGACCAGTAG AGCGGTTGGA CCCAGACAAG CCCAAGTGTC ATTTCTACAC GGTGATCAAA 
GTGAGAATGA GCTACCAGGA TTGGGGGGCA AGGAAGATAG GAGGGTCAAA CAGAGTCGGG 
GAGAAGCCAG GGAGAGCTAC AGAGAAACCG GGTCTAGCAG AGCAAGCGAT GCGAGAGCTG 
CCCATCTTCC AACCAGCGCA CCCCTAGACA TTGACACTGC ATCGGAGTCA GGCCAAGATC 
CGCAGGACAG TCGACGGTCA GCTGACGCCC TGCTCAGGCT GCAAGCCATG GCAGGAATCT 
TGGAAGAACA AGGCTCAGAC ACGGACACCC CTAGGGTGTA CAATGACAGA GATCTTCTAG 
ACTAGGTGCG AGAGGCCGAG GACCAGAACA ACATCCGCCT ACCCTCCATC ATTGTTATAA 
AAAACTTAGG AACCAGGTCC ACACAGCCGC CAGCCAACCA ACCATCCACT CCTACGACTG 
GGGCCGATGG CAGAAGAGCA GGCACGCCAT GTCAAAAACG GACTGGAATG CATCCGGGCT 
CTCAAGGCCG AGCCCATCGG CTCACTGGCC GTCGAGGAAG CCATGGCAGC ATGGTCACAA 
ATATCAGACA ACCCAGGACA GGACCGAACC ACCCGCAAGG AAGAGGAGGC AGGCAGTTCG 
GGTCTCAGCA AACCATGCCT CTCAGCAATT GGATCAACTG AAGGCAGTGC ACCTCGCATC 
TGCGGTCAGG GATCTGGAGA GAGCGATGAC AACGCTGAAA CTTTGGGAAT CCCCTCAAGA 
AATCTCCAGG CATCAAGCAC TGGGTTACAG TGTTATCATG TTTATGATCA CAGCGGTGAA 
GCGGTTAAGG GAATCCAAGA TGCTGACTCT ATCATGGTTC AATCAGGCCT TGATGGTGAT 
AGCACCCTCT CAGGAGGAGA CGATGAATCT GAAAACAGCG ATGTGGATAT TGGCGAACCT 
GATACCGAGG GATATGCTAT CACTGACCGG GGATCTGCTC CCATCTCTAT GGGGTTCAGG 
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GCTTCTGATG TTGAAACTGC AGAAGGAGGG GAGATCCACG AGCTCCTGAG ACTCCAATCT 
AGAGGCAACA ACTTCCCGAA GCTTGGGAAA ACTCTCAATG TTCCTCCGCC CCCGAACCCC 
GGTAGGGCCA GCACTTCCGA GACACCCATT AAAAAGGGGA CAGACGCGAG ATTAGCCTCA 
TTTGGAGCGG AGATCGCGTC TTTATTGACA GGTGGTGCAA CCCAATGTGC TCGAAAGTCA 
CCCTCGGAAC CATCAGGGCC AGGTGCACCT GTGGGGAATG TCCCCGAGTG TGTGAGCAAT 
GCCGCACTGA TACAGGAGTG GACACCCGAA TCTGGTACCA CAATCTCCCC GAGATCCCAG 
AATAATGAAG AAGGGGGAGA TTATTATGAT GATGAGCTGT TCTCCGATGT CCAAGACATC 
AAAACAGCCT TGGCCAAAAT ACACGAGGAT AATCAGAAGA TAATCTCCAA GCTAGAATCA 
CTGCTGTTAT TGAAGGGAGA AGTTGAGTCA ATTAAAAAGC AGATCAACAG GCAAAATATC 
AGCATATCCA CCCTGGAAGG ACACCTCTCA AGCATCATGA TCGCCATTCC TGGACTTGGG 
AAGGATCCCA ACGACCCCAC TGCAGATGTC GAACTCAATC CCGACCTGAA ACCCATCATA 
GGCAGAGATT CAGGCCGAGC ACTGGCCGAA GTTCTCAAGA AACCCGTTGC CAGCCGACAA 
CTCCAAGGAA TGACAAATGG ACGGACCAGT TCCAGAGGAC AGCTGCTGAA GGAATTTCAA 
CTAAAGCCGA TCGGGAAAAA GATGAGCTCA GCCGT CGGGT TTGTT CCTGA CACCGGCCCC 
GCATCACGCA GTGTAATCCG CTCCATTATA AAATCCAGCC GGCTAGAGGA GGATCGGAAG 
CGTTACCTGA TGACTCTCCT TGATGATATC AAAGGAGCCA ACGATCTTGC CAAGTTCCAC 
CAGATGCTGA TGAAGATAAT AATGAAGTAG CTACAGCTCA ACTTACCTGC CAACCTCATG 
CCAATCGACC TAATTAGTAC AGCCTAAATC CATTATAAAA AACTTAGGAG CAAAGTGATT 
GCCTCCCAAG TTCCACAATG ACAGAGATCT ACGACTTCGA CAAGTCGGCA TGGGACATCA 
AAGGGTCGAT CGCTCCGATA CAACCTACCA CCTACAGTGA TGGCAGGCTG GTGCCCCAGG 
TCAGAGTCAT AGATCCTGGT CTAGGCGACA GGAAGGATGA ATGCTTTACG TACATGTTTC 
TGCTGGGGGT TGTTGAGGAC AGCGATCCCC TAGGGCCTCC AATCGGGCGA GCATTTGGGT 
CCCTGCCCTT AGGTGTTGGT AGATCCACAG CAAAACCCGA AGAACTCCTC AAAGAGGCCA 
CTGAGCTTGA CATAGTCGTT AGACGTACAG CAGGGCTCAA TGAAAAACTG GTGTTCTACA 
ACAACACCCC ACTAACTCTC CTCACACCTT GGAGAAAGGT CCTAACAACA GGGAGTGTCT 
TCAACGCAAA CCAAGTGTGC AATGCGGTTA ATCTGATACC GCTGGATACC CCGCAGAGGT 
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TCCGTGTTGT TTATATGAGC ATCACCCGTC TTTCGGATAA CGGGTATTAC ACCGTTCCTA 
GAAGAATGCT AGAATTCAGA TCGGTCAATG CAGTGGCTTT CAACCTGCTG GTGACCCTTA 
GGATTGACAA AGCGATTGGC CCTGGGAAGA TCATCGATAA TGCAGAGCAA CTTCCTGAGG 
CAACAT TT AT GGTCCACATC GGGAACTTCA GGAGAAAGAA GAGTGAAGTC TACTCTGCTG 
ATTATTGCAA AATGAAAATC GAAAAGATGG GCCTGGTTTT TGCACTTGGT GGGATAGGGG 
GCACCAGTCT TCACATTAGA AGCACAGGCA AAATGAGCAA GACTCTCCAT GCACAACTCG 
GGTTCAAAAA GACCTTATGT TACCCACTGA TGGATATCAA TGAAGACCTT AATCGATTAC 
TCTGGAGGAG CAGATGCAAG ATAGTAAGAA TCCAGGCAGT TTTGCAGCCA TCAGTTCCCC 
AAGAATTCCG CATTTACGAC GACGTGATCA TAAATGATGA CCAAGGACTA TTCAAAGTTC 
TGTAGACCGT AGTGCCCAGC AATACCCGAA AACGACCCCC CTCATAATGA CAGCCAGAAG 
GCCCGGACAA AAAAGCCCCC TCCAAAAGAC TCCACGGACC AAGTGAGAGG CCAGCCAGCA 
GCTGACGGCA AGCGTGAACA CCAGGCGGCC TGGGCACAGA ACAGCCCCGA CACAAGGCAA 
CCACCAGCCA TCCCAATCTG CGTCCTCCTC GTGGGACCCC CGAGGACCAA CCCCCAAGGT 
CGCCCCCGAC CCAGACCACC AACCGCATCC CCACAGCCCC CGGGAAAGAG ACCCCCAGCA 
ACTGGAAGGC CCCTCCCCCT TTCCCTCAAC GCAAGAACTC CACAACCGAA CCGCACAAGC 
GATCGAGGTG ACCCAACCGC AGGCATCCGA CTCCCTAGAC AGATCCTCTC CCCCCGGCAA 
ACTAAACAAA ACTTAGGGCC AAGGAACATA CACACCCGAC AGAACCCAGA CCCCGGCCCA 
CGGCGCCGCG CCCCCACCTC CCGACAACCA GAGGGAGCCC CCAACCAATC CCGCCGGCTC 
CCCCGGTGCC CACAGGCAGG CACACCAACC CTCGAACAGA CCCAGCACCC AGCCATCGAC 
AATTCAAGAC GGGGGGCCCC CCCCAAAAAA AGGCCCCCAG GGGCCGACAG CCAGCACCGC 
GAGGAAGCCC ACCCACCCCA CACACGACCA CAGGAACCGA ACCAGAATCC AGACCACCCT 
GGGCCACCAG TTCCCAGACT CGGCCATCAC CCCGCAGAAA GGAAAGGCCA CAACCCGCGC 
ACCCCTGCCC TGATCCGGTG GGCGGCCACC CAACCCGAAC CAGCACCCAA GAGCGATCCC 
CGAAGGGCCC CCGAACCGCA AAAGACATCA GTATCCCACA GCCTCTCCAA GTCCCCCGGT 
CTCCCCCTCT TCTCGAAGGG ACCAAAAGAT CAATCCACCA CACCCGACGA CACTCAATTC 
CCCACCCCTA AAGGAGACAC CGGGAATCCC AGAATCAAGA CTCATCCAAT GTCCATCATG 
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G6TCTCAA6G T6AACGTCTC TGCCATATTC ATGGCAGTAC TGTTAACTCT CCAAACACCC 
ACCGGTCAAA TCCATTGGGG CAATCTCTCT AAGATAGGGG TGGTAGGGAT AGGAAGTGCA 
AGCTACAAAG TTATGACTCG TTCCAGCCAT CAATCATTAG TCATAAAATT AATGCCCAAT 
ATAACTCTCC TCAATAACTG CACGAGGGTA GAGATTGCAG AATACAGGAG ACTACTGAGA 
ACAGTTTTGG AACCAATTAG AGATGCACTT AATGCAATGA CCCAGAATAT AAGACCGGTT 
CAGAGTGTAG CTTCAAGTAG GAGACACAAG AGATTTGCTG GAGTTGTCCT GGCGGGTGCG 
GCCCTAGGCG TTGCCACAGC TGCTCAGATA ACAGCCGGCA TTGCACTTCA CCAGTCCATG 
TTGAACTCTC AAGCCATCGA CAATCTGAGA GCGAGCCTGG AAACTACTAA TCAGGCAATT 
GAGGCAATCA GACAAGCAGG GCAGGAGATG ATATTGGCTG TTCAGGGTGT CCAAGACTAC 
ATCAATAATG AGCTGATACC GTCTATGAAC CAACTATCTT GTGATTTAAT CGGCCAGAAG 
CTAGGGCTCA AATTGCTCAG ATACTATACA GAAATCCTGT CACTATTTGG CCCCAGCTTA 
CGGGACCCCA TATCTGCGGA GATATCTATC CAGGCTTTGA GCTATGCGCT TGGAGGAGAT 
ATCAATAAGG TGTTAGAAAA GCTCGGATAC AGTGGAGGTG ATTTACTGGG CATCTTAGAG 
AGCAGAGGAA TAAAGGCCCG GATAACTCAC GTCGACACAG AGTCCTACTT CATTGTACTC 
AGTATAGCCT ATCCGACGCT GTCCGAGATT AAGGGGGTGA TTGTCCACCG GCTAGAAGGG 
GTCTCGTACA ACATAGGCTC TCAAGAGTGG TATACCACTG TGCCCAAGTA TGTTGCAACC 
CAAGGGTACC TTATCTCGAA TTTTGATGAG TCATCGTGTA CTTTCATGCC AGAGGGGACT 
GTGTGCAGCC AAAATGCCTT GTACC CGATG AGTCCTCTGC TCCAAGAATG CCTCCGGGGG 
TCCACCAAGT CCTGTGCTCG TACACTTGTA TCCGGGTCTT TTGGGAACCG GTTCATTTTA 
TCACAAGGGA ATCTAATAGC CAATTGTGCA TCAATCCTTT GCAAGTGTTA CACAACAGGA 
ACGATCATTA ATCAGGACCC TGACAAGATC CTAACATACA TTGCTGCCGA TCACTGCCCG 
GTGGTCGAGG TGAACGGCGT GACCATCCAA GTCGGGAGCA GGCGGTATCC GGACGCTGTG 
TACTTGCACA GAATTGACCT CGGTCCTCCC ATATCATTGG AGAGGTTGGA CGTAGGGACA 
AATCTGGGGA ATGCAATTGC TAAGTTGGAG GATGCCAAGG AATTGTTGGA GTCATCGGAC 
CAGATATTGA GGAGTATGAA AGGTTTATCG AGCACTAGCA TAGTTTACAT CCTGATTGCA 
GTGTGTCTTG GAGGGTTGAT AGGGATCCCC GCTTTAATAT GTTGCTGCAG GGGGCGTTGT 
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AACAAAAAGG GAGAACAAGT TGGTATGTCA AGACCAGGCC TAAAGCCTGA TCTTACAGGA 
ACATCAAAAT CCTATGTAAG GTCGCTCTGA TCCTCTACAA CTCTTGAAAC ACAAATGTCC 
CACAAGTCTC CTCTTCGTCA TCAAGCAACC ACCGCATCCA GCATCGAGCC CACCTGAAAT 
TGTCTCCGGA TTCCCTCTGG CCGAACAATA TCGGTAGTTA ATTAAAACTT AGGGTGCAAG 
ATCATCCACA ATGTCACCAC AACGAGACCG GATAAATGCC TTCTACAAAG ACAACCCCCA 
TCCTAGGGGA AGTAGGATAG TTATTAACAG AGAACATCTT ATGATTGATA GACCTTATGT 
TTTGCTGGCT GTTCTATTCG TCATGTTTCT GAGCTTGATC GGGTTGCTAG CCATTGCAGG 
CATAAGACTT CATCGGGCAG CCATCTACAC CGCAGAGATC CATAAAAGCC TCAGCACCAA 
TCTAGATGTA ACTAACTCAA TCGAGCATCA GGTCAAGGAC GTGCTGACAC CACTCTTCAA 
GATCATCGGT GATGAAGTGG GCCTGAGGAC ACCTCAGAGA TTCACCGACC TAGTGAAATT 
CATCTCTGAC AAGATTAAAT TCCTTAATCC GGATAGGGAG TACGACTTCA GAGATCTCAC 
TTGGTGTATC AACCCGCCAG AGAGAATCAA ATTGGATTAT GATCAATACT GTGCAGATGT 
GGCTGCTGAA GAACTCATGA ATGCATTGGT GAACTCAACT CTACTGGAGG CCAGGGTAAC 
CAATCAGTTC CTAGCTGTCT CAAAGGGAAA CTGCTCAGGG CCCACTACAA TCAGAGGTCA 
ATTCTCAAAC ATGTCGCTGT CCCTGTTGGA CTTGTATTTA AATCGAGGTT ACAATGTGTC 
ATCTATAGTC ACTATGACAT CCCAGGGAAT GTACGGGGGA ACTTACCTAG TGGAAAAGCC 
TAATCTGAGC AGTAAAGGGT CAGAGTTGTC ACAACTGAGC ATGCACCGAG TGTTTGAAGT 
AGGTGTTATC AGAAATCCGG GTTTGGGGGC TCCGGTGTTC CATATGACAA ACTATTTTGA 
GCAACCAGTC AGTAATGATT TCAGCAACTG CATGGTGGCT TTGGGGGAGC TCAAATTCGC 
AGCCCTTTGT CACAGGGAAG ATTCTATCAC AATTCCCTAT CAGGGATCAG GGAAAGGTGT 
CAGCTTCCAG CTCGTCAAGC TAGGTGTCTG GAAATCCCCA AC CG ACATG C AATCCTGGGT 
CCCCCTATCA ACGGATGATC CAGTGATAGA CAGGCTCTAC CTCTCATCTC ACAGAGGCGT 
TATCGCTGAC AATCAAGCAA AATGGG CTGT CCCGACAACA CGGACAGATG ACAAGTTGCG 
AATGGAGACA TGCTTCCAGC AGGCGTGTAA GGGTAAAATC CAAGCACTCT GCGAGAATCC 
CGAGTGGGCA CCATTGAAGG ATAACAGGAT TCCTTCATAC GGGGTCTTGT CTGTTAATCT 
GAGTCTGACA GTTGAGCTTA AAATCAAAAT TGCTTCAGGA TTCGGGCCAT TGATCACACA 
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CGGTTCAGGG ATGGACCTAT ACAAATCCAA CCACAACAAT GTGTATTGGC TGACTATCCC 8640 

GCCAATGAAG AACCTAGCCT TAGGTGTAAT CAACACATTG GAGTGGATAC CGAGATTCAA 8700 

GGTTAGTCCC TACCTCTTCA CTGTTCCAAT TAAGGAAGCA GGCGAGGACT GCCATGCCCC 8760 

AACATACCTA CCTGCGGAGG TGGATGGTGA TGTCAAACTC AGTTCCAATC TGGTGATTCT 8820 

ACCTGGTCAA GATCTCCAAT ATGTTTTGGC AACCTATGAT ACTTCCAGAG TTGAACATGC 8880 

TGTGGTTTAT TACGTTTACA GCCCAAGCCG CTCATTTTCT TACTTTTATC CTTTTAGGTT 8940 

GCCTATAAGG GGGGTCCCCA TCGAATTACA AGTGGAATGC TTCACATGGG ACCAAAAACT 9000 

CTGGTGCCGT CACTTCTGTG TGCTTGCGGA CTCAGAATCT GGTGGATATA TCACTCACTC 9060 

TGGGATGGTG GGCATGGGAG TCAGCTGCAC AGTCACTCGG GAAGATGGAA CCAACCGCAG 9120 

ATAGGGCTGC CAGTGAACCA ATCACATGAT GTCACCCAGA CATCAGGCAT ACCCACTAGT 9180 

GTGAAATAGA CATCAGAATT AAGAAAAACG TAGGGTCCAA GTGGTTCCCC GTTATGGACT 9240 

CGCTATCTGT CAACCAGATC TTATACCCTG AAGTTCACCT AGATAGCCCG ATAGTTACCA 9300 

ATAAGATAGT AGCTATCCTG GAGTATGCTC GAGTCCCTCA CGCATACAGC CTGGAGGACC 9360 

CTACACTGTG TCAGAACATC AAGCACCGCC TAAAAAACGG ATTTTCCAAC CAAATGATTA 9420 

TAAACAATGT GGAAGTTGGG AATGTCATCA AGTCCAAGCT TAGGAGTTAT CCGACCCACT 9480 

CTCATATTCC ATATCCAAAT TGTAATCAGG ATTTATTTAA CATAGAAGAC AAAGAGTCAA 9540 

CAAGGAAGAT CCGTGAGCTC CTCAAAAAGG GAAATTCGCT GTACTCCAAA GTCAGTGATA 9600 

AGGTTTTCCA ATGCCTGAGG GACACTAACT CACGGCTTGG CCTAGGCTCC GAATTGAGGG 9660 

AGGACATCAA GGAGAAAATT ATTAACTTGG GAGTTTACAT GCACAGCTCC CAATGGTTTG 9720 

AGCCCTTTCT GTTTTGGTTT ACAGTCAAGA CTGAGATGAG GTCAGTGATT AAATCACAAA 97 80 

CCCATACTTG CCATAGGAGG AGACACACAC CAGTATTCTT CACTGGTAGT TCAGTTGAGT 9840 

TGCTAATCTC TCGTGACCTT GTTGCTATAA TCAGTAAAGA GTCTCAACAT GT AT ATT AC C 9900 

TGACGTTTGA ACTGGTCTTG ATGTATTGTG ATGTCATAGA GGGGAGGTTA ATGACAGAGA 9960 

CCGCTATGAC CATTGATGCT AGGTATACAG AGCTTCTAGG AAGAGTCAGA TACATGTGGA 10020 

AACTGATAGA TGGTTTCTTC CCTGCACTCG GGAATCCAAC TTACCAAATT GTAGCCATGC 10080 

TGGAGCCTCT TTCACTTGCT TACCTGCAGC TGAGGGATAT AACAGTAGAA CTCAGAGGTG 10140 



SUBSTITUTE SHEET (RULE 26) 



# 



WO 98/13501 



PCT/US97/16718 



- 131 - 



CTTTCCTTAA CCACTGCTTT ACTGAAATAC ATGATGTTCT TGACCAAAAC GGGTTTTCTG 
ATGAAGGTAC TTATCATGAG TTAATTGAAG CC CTAGATT A CATTTTCATA ACTGATGACA 
TACATCTGAC AGGGGAGATT TTCTCATTTT TCAGAAGTTT CGGCCACCCC AGACTTGAAG 
CAGTAACGGC TGCTGAAAAT GTTAGGAAAT ACATGAATCA GCCTAAAGTC ATTGTGTATG 
AGACTCTGAT GAAAGGTCAT GCCATATTCT GTGGAATCAT AATCAACGGC TATCGTGACA 
GGCACGGAGG CAGTTGGCCA CCCCTGACCC TCCCCCTGCA TGCTGCAGAC ACAATCCGGA 
ATGCTCAAGC TTCAGGTGAA GGGTTAACAC ATGAGCAGTG CGTTGATAAC TGGAAATCTT 
TTGCTGGAGT GAAATTTGGC TGCTTTATGC CTCTTAGCCT GGATAGTGAT CTGACAATGT 
ACCTAAAGGA CAAGGCACTT GCTGCTCTCC AAAGGGAATG GGATTCAGTT TACCCGAAAG 
AGTTCCTGCG TTACGACCCT CCCAAAGGAA CTGGGTCACG GAGGCTTGTA AATGTTTTCC 
TTAATGATTC GAGCTTTGAC CCATATGACA TGATAATGTA TGTTGTAAGT GGAGCTTACC 
TCCATGACCC TGAGTTCAAC CTGTCTTACA GCCTGAAAGA AAAGGAGATC AAGGAAACAG 
GTAGACTTTT TGCTAAAATG ACTTACAAAA TGAGGGCATG CCAAGTGATT GCTGAAAATC 
TAATCTCAAA CGGGATTGGC AATTATTTTA AGGACAATGG GATGGCCAAG GACGAGCACG 
ATTTGACTAA GGCACTCCAC ACTCTAGCTG TCTCAGGAGT CCCCAAAGAT CTCAAAGAAA 
GTCACAGGGG GGGGCCAGTC TTAAAAACCC ACTCCCGAAG CCCAGTCCAC ACAAGTACCA 
AGAACGTGAG AGCAGCAAAA GGGTTTATAG GATTCCCTCA TGTAATTCGG CAGGACCAAG 
ACACTGATCA TCCGGAGAAT ATGGAGGCTT ACGAGACAGT CAGTGCATTT ATCACGACTG 
ATCTCAAGAA GTACTGCCTT AATTGGAGAT ATGAGACCAT CAGCTTATTT GCACAAAGGC 
TAAATGAGAT TTACGGATTA CCCTCATTTT TCCAGTGGCT GCATAAGAGG CTTGAAACCT 
CTGTCCTCTA TGTAAGTGAC CCTCATTGCC CCCCTGACCT TGACGCCCAT GTCCCGTTAT 
GCAAAGTCCC CAATGACCAA ATCTTCATTA AGTACCCTAT GGGAGGTATA GAAGGGTATT 
GTCAGAAGCT GTGGACCATC AGCACCATTC CCTATTTATA CCTGGCTGCT TATGAGAGCG 
GAGTAAGGAT TGCTTCGTTA GTGCAAGGGG ACAATCAGAC CATAGCCGTA ACAAAAAGGG 
TACCCAGCAC ATGGCCTTAC AACCTTAAGA AACGGGAAGC TGCTAGAGTA ACTAGAGATT 
ACTTTGTAAT TCTTAGGCAA AGGCTACATG AC AT AG G C C A TCACCTCAAG GCAAATGAGA 
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CAATTGTCTC ATCACATTTT TTTGTCTATT CAAAAGGAAT ATATTATGAT GGGCTACTTG 
TGTCCCAATC ACTCAAGAGC ATCGCAAGAT GTGTATTCTG GTCAGAGACT ATAGTTGATG 
AAACAAGGGC AGCATGCAGT AATATTGCTA CAACAATGGC TAAAAGCATC GAGAGAGGTT 
ATGACCGTTA CCTTGCATAT TCCCTGAACG TCCTAAAAGT GATACAGCAA ATCCTGATCT 
CTCTTGGCTT CACAATCAAT TCAACCATGA CCCGGGATGT AGTCATACCC CTCCTCACAA 
ACAACGATCT CTTAATAAGG ATGGCACTGT TGCCCGCTCC TATCGGGGGG ATGAATTATC 
TGAATATGAG CAGGCTGTTT GTCAGAAACA TCGGTGATCC AGTAACATCA TCAATTGCTG 
ATCTCAAGAG AATGATTCTC TCATCACTAA TGCCTGAAGA GACCCTTCAT CAAGTAATGA 
CACAACAACC GGGGGACTCT TCATTCCTAG ACTGGGCTAG CGACCCTTAC TCAGCAAATC 
TTGTATGCGT CCAGAGCATC ACTAGACTCC TCAAGAACAT AACTGCAAGG TTTGTCCTGA 
TCCATAGTCC AAACCCAATG TTAAAAGGGT TATTCCATGA TGACAGTAAA GAAGAGGACG 
AGGGACTGGC GGCATTCCTC ATGGACAGGC ATATTATAGT ACCTAGGGCA GCTCATGAAA 
TCCTGGATCA TAGTGTCACA GGGGCAAGAG AGTCTATTGC AGGCATGCTA GATACCACAA 
AAGGCCTGAT TCGAGCCAGC ATGAGGAAGG GGGGGTTAAC CTCTCGAGTG ATAACCAGAT 
TGTCCAATTA TGACTATGAA CAATTTAGAG CAGGGATGGT GCTATTGACA GGAAGAAAGA 
GAAATGTCCT CATTGACAAA GAGTCATGTT CAGTGCAGCT GGCTAGAGCC CTAAGAAGCC 
ATATGTGGGC AAGGCTAGCT CGAGGACGGC CTATTTACGG CCTTGAGGTC CCTGATGTAC 
TAGAATCTAT GCGAGGCCAC CTTATTCGGC GCCATGAGAC ATGTGTCATC TGCGAGTGTG 
GATCAGTCAA CTACGGATGG TTTTTTGTCC CCTCGGGTTG CCAACTGGAT GATATTGACA 
AGGAAACATC ATCCTTGAGA GTCCCATATA TTGGTTCTAC CACTGATGAG AGAACAGACA 
TGAAGCTTGC CTTCGTAAGA GCCCCAAGTC GATCCTTGCG ATCTGCTGTT AGAATAGCAA 
CAGTGTACTC ATGGGCTTAT GGTGATGATG ATAGCTCTTG GAACGAAGCC TGGTTGTTGG 
CAAGGCAAAG GGCCAATGTG AGCCTGGAGG AGCTAAGGGT GATCACTCCC ATCTCAACTT 
CGACTAATTT AGCGCATAGG TTGAGGGATC GTACCACTCA AGTGAAATAC TCAGGTACAT 
CCCTTGTCCG AGTGGCAAGG TATACCACAA TCTCCAACGA CAATCTCTCA TTTGTCATAT 
CAGATAAGAA GGTTGATACT AACTTTATAT ACCAACAGGG AATGCTTCTA GGGTTGGGTG 
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TTTTAGAAAC ATTGTTTCGA CTC6AGAAAG ATACCGGATC ATCTAACACG GTATTACATC 
TTCACGTCGA AACAGATTGT TGCGTGATCC CGATGATAGA TCATCCCAGG ATACCCAGCT 
CCCGCAAGCT AGAGCTTAGG GGAGAGCTAT GTACCAACCC ATTGATATAT GATAATGCAC 
CTTTAATTGA CAGAGATGCA ACAAGGCTAT ACACCCAGAG CCATAGGAGG CACCTTGTGG 
AATTTGTTAC ATGGTCCACA CCCCAACTAT ATCACATTTT AGCTAAGTCC ACAGCACTAT 
CTATGATTGA CCTGGTAACA AAATTTGAGA AGGACCATAT GAATGAAATT TCAGCTCTCA 
TAGGGGATGA CGATATCAAT AGTTTCATAA CTGAGTTTCT GCTTATAGAG CCAAGATTAT 
TCACTATCTA CTTGGGCCAG TGTGCAGCCA TCAATTGGGC ATTTGATGTA CATTATCATA 
GACCATCAGG GAAATATCAG ATGGGTGAGC TGTTGTCTTC GTTCCTTTCT AGAATGAGCA 
AAGGAGTGTT TAAGGTGCTT GTCAATGCTC TAAGCCACCC AAAGATCTAC AAGAAATTCT 
GGCATTGTGG TATTATAGAG CCTATCCATG GTCCTTCACT TGATGCTCAA AACTTACACA 
CAACTGTGTG CAACATGATT TACACATGCT ATATGACCTA CCTCGACCTG TTGTTGAATG 
AAGAGTTAGA AGAGTTCACA TTTCTTCTGT GTGAAAGCGA CGAGGATGTA GTACCGGACA 
GATTCGACAA TATCCAGGCA AAACACTTGT GTGTTCTAGC AGATTTGTAC TGTCAACCAG 
GGACCTGCCC ACCAATTCGA GGTCTACGAC CTGTAGAGAA ATGTGCAGTT CTAACCGATC 
ATATCAAGGC AGAGGCTAGG TTATCTCCAG CAGGGTCTTC GTGGAACATA AATCCAATTA 
TTGTAGACCA TTACTCATGC TCTCTGACTT ATCTCCGGCG AGGATCGATC AAACAGATAA 
GATTGAGAGT TGATCCAGGA TTCATTTTTG ACGCCCTCGC TGAGGTAAAT GTCAGTCAGC 
CAAAGATCGG CAGCAACAAC ATCTCAAATA TGAGCATCAA GGATTTCAGA CCTCCACACG 
ATGATGTTGC AAAATTGCTC AAAGATATCA ACACAAGCAA GCACAATCTT CCCATTTCAG 
GGGGTAATCT CGCCAATTAT GAAATCCACG CTTTCCGCAG AATCGGGTTA AACTCATCCG 
CTTGCTACAA AGCTGTTGAG ATATCAACAT TAATTAGGAG ATGCCTTGAG CCAGGGGAAG 
ACGGCTTGTT CTTGGGTGAG GGGTCGGGTT CTATGTTGAT CACTTATAAG GAGATACTAA 
AACTAAACAA GTGCTTCTAT AATAGTGGGG TTTCCGCCAA TTCTAGATCT GGTCAAAGGG 
AATTAGCACC CTATCCCTCC GAAGTTGGTC TTGTCGAACA CAGAATGGGA GTAGGTAATA 
TTGTCAAAGT GCTCTTTAAC GGGAGGCCCG AAGTCACGTG GGTAGGCAGT GTAGATTGCT 
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TCAATTTCAT AGTCAGTAAT ATCCCTACCT CTAGT6TGGG GTTTATCCAT TCAGATATAG 
AGACCTTACC TAACAAAGAT ACTATAGAGA AGCTAGAGGA ATTAGCAGCC ATCTTATCGA 
TGGCTCTGCT CCTTGGCAAA ATAGGATCAA TACTGGTGAT TAAGCTTATG CCTTTCAGCG 
GGGATTTTGT TCAGGGATTT ATAAGTTATG TAGGGTCTTA TTATAGAGAA GTGAACCTTG 
TCTACCCTAG ATACAGCAAC TTCATATCTA CTGAATCTTA TTTAGTCATG ACAGATCTCA 
AAGCTAACCG GCTAATGAAT CCTGAAAAGA TTAAGCAGCA GATAATTGAA TCATCTGTGC 
GGACTTCACC TGGACTTATA GGTCACATCC TATCCATTAA GCAACTAAGC TGCATACAAG 
CAATTGTGGG AGACGCAGTT AGTAGAGGTG GTATCAACCC TATTCTGAAG AAACTTACAC 
CTATAGAGCA GGTGCTGATC AATTGCGGGT TGGCAATTAA CGGACCTAAA CTGTGCAAAG 
AATTGATCCA CCATGATGTT GCCTCAGGGC AAGATGGATT GCTTAACTCT ATACTCATCC 
TCTACAGGGA GTTGGCAAGA TTCAAAGACA ACCAAAGAAG TCAACAAGGG ATGTTCCATG 
CTTACCCCGT ATTGGTAAGT AGCAGGCAAC GAGAACTTAT ATCTAGGATC ACCCGCAAAT 
TTTGGGGGCA TATTCTTCTT TACTCCGGGA ACAGAAAGTT GATAAATCGG TTTATCCAGA 
ATCTCAAGTC CGGTTACCTG ATACTAGACT TACACCAGAA TATCTTCGTT AAGAATCTAT 
CTAAGTCAGA GAAACAGATT ATTATGACGG GGGGTTTAAA ACGTGAGTGG GTTTTTAAGG 
TAACAATCAA GGAGACCAAA GAATGGTATA AGTTAGTCGG ATACAGTGCC CTGATTAAGG 
ATTAATTGGT TGGACTCCGG GACCCTAATC CTGCCCTAGG TAGTTAGGCA TTATTTGCAA 
TATATTAAAG AAAACTTTGA AAATACGAAG TTTCTATTCC CAGCTTTGTC TGGT 
(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2183 amino acids 

(B) TYPE: amino acid 

(C) STRAND EDNESS : 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 
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Met Asp Ser Leu Ser Val Asn Gin lie Leu Tyx Pro Glu Val His Leu 
15 10 15 

Asp Ser Pro lie Val Thr Asn Lys lie Val Ala lie Leu Glu Tyr Ala 
20 25 30 

Arg Val Pro His Ala Tyr Ser Leu Glu Asp Pro Thr Leu Cys Gin Asn 
35 40 45 

lie Lys His Arg Leu Lys Asn Gly Phe Ser Asn Gin Met lie lie Asn 
50 55 60 

Asn Val Glu Val Gly Asn Val lie Lys Ser Lys Leu Arg Ser Tyr Pro 
65 70 75 80 

Thr His Ser His lie Pro Tyr Pro Asn Cys Asn Gin Asp Leu Phe Asn 
85 90 95 

He Glu Asp Lys Glu Ser Thr Arg Lys He Arg Glu Leu Leu Lys Lys 
100 105 110 

Gly Asn Ser Leu Tyr Ser Lys Val Ser Asp Lys Val Phe Gin Cys Leu 
115 120 125 

Arg Asp Thr Asn Ser Arg Leu Gly Leu Gly Ser Glu Leu Arg Glu Asp 
130 135 140 

He Lys Glu Lys He He Asn Leu Gly Val Tyr Met His Ser Ser Gin 
145 150 155 160 

Trp Phe Glu Pro Phe Leu Phe Trp Phe Thr Val Lys Thr Glu Met Arg 
165 170 175 

Ser Val He Lys Ser Gin Thr His Thr Cys His Arg Arg Arg His Thr 
180 185 190 

Pro Val Phe Phe Thr Gly Ser Ser Val Glu Leu Leu He Ser Arg Asp 
195 200 205 

Leu Val Ala He He Ser Lys Glu Ser Gin His Val Tyr Tyr Leu Thr 
210 215 220 

Phe Glu Leu Val Leu Met Tyr Cys Asp Val He Glu Gly Arg Leu Met 
225 230 235 240 

Thr Glu Thr Ala Met Thr He Asp Ala Arg Tyr Thr Glu Leu Leu Gly 
245 250 255 

Arg Val Arg Tyr Met Trp Lys Leu He Asp Gly Phe Phe Pro Ala Leu 
260 265 270 

Gly Asn Pro Thr Tyr Gin He Val Ala Met Leu Glu Pro Leu Ser Leu 



SUBSTITUTE SHEET (RULE 26) 



WO 98/13501 PCTAJS97/16718 



136 



275 280 285 

Ala Tyr Leu Gin Leu Arg Asp lie Thr Val Glu Leu Arg Gly Ala Phe 
290 295 300 

Leu Asn His Cys Phe Thr Glu lie His Asp Val Leu Asp Gin Asn Gly 
305 310 315 320 

Phe Ser Asp Glu Gly Thr Tyr His Glu Leu He Glu Ala Leu Asp Tyr 
325 330 335 

He Phe He Thr Asp Asp He His Leu Thr Gly Glu He Phe Ser Phe 
340 345 350 

Phe Arg Ser Phe Gly His Pro Arg Leu Glu Ala Val Thr Ala Ala Glu 
355 360 365 

Asn Val Arg Lys Tyr Met Asn Gin Pro Lys Val He Val Tyr Glu Thr 
370 375 380 

Leu Met Lys Gly His Ala He Phe Cys Gly He He He Asn Gly Tyr 
385 390 395 400 

Arg Asp Arg His Gly Gly Ser Trp Pro Pro Leu Thr Leu Pro Leu His 
405 410 415 

Ala Ala Asp Thr He Arg Asn Ala Gin Ala Ser Gly Glu Gly Leu Thr 
420 425 430 

His Glu Gin Cys Val Asp Asn Trp Lys Ser Phe Ala Gly Val Lys Phe 
435 440 445 

Gly Cys Phe Met Pro Leu Ser Leu Abp Ser Asp Leu Thr Met Tyr Leu 
450 455 460 

Lys Asp Lys Ala Leu Ala Ala Leu Gin Arg Glu Trp Asp Ser Val Tyr 
465 470 475 480 

Pro Lys Glu Phe Leu Arg Tyr Asp Pro Pro Lys Gly Thr Gly Ser Arg 
485 490 495 

Arg Leu Val Asn Val Phe Leu Asn Asp Ser Ser Phe Asp Pro Tyr Asp 
500 505 510 

Met He Met Tyr Val Val Ser Gly Ala Tyr Leu His Asp Pro Glu Phe 
515 520 525 

Asn Leu Ser Tyr Ser Leu Lys Glu Lys Glu He Lys Glu Thr Gly Arg 
530 535 540 

Leu Phe Ala Lys Met Thr Tyr Lys Met Arg Ala CyB Gin Val He Ala 
545 550 555 560 
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Glu Asn Leu lie Ser Asn Gly lie Gly Asn Tyr Phe Lys Asp Asn Gly 
565 570 575 

Met Ala Lys Asp Glu His Asp Leu Thr Lys Ala Leu His Thr Leu Ala 
580 585 590 

Val Ser Gly Val Pro Lys Asp Leu Lys Glu Ser His Arg Gly Gly Pro 
595 600 605 

Val Leu Lys Thr His Ser Arg Ser Pro Val His Thr Ser Thr Lys Asn 
610 615 620 

Val Arg Ala Ala Lys Gly Phe lie Gly Phe Pro His Val lie Arg Gin 
625 630 635 640 

Asp Gin Asp Thr Asp His Pro Glu Asn Met Glu Ala Tyr Glu Thr Val 
645 650 655 

Ser Ala Phe He Thr Thr Asp Leu Lys Lys Tyr Cys Leu Asn Trp Arg 
660 665 670 

Tyr Glu Thr He Ser Leu Phe Ala Gin Arg Leu Asn Glu He Tyr Gly 
675 680 685 

Leu Pro Ser Phe Phe Gin Trp Leu His Lys Arg Leu Glu Thr Ser Val 
690 695 700 

Leu Tyr Val Ser Asp Pro His Cys Pro Pro Asp Leu Asp Ala His Val 
705 710 715 720 

Pro Leu Cys Lys Val Pro Asn Asp Gin He Phe He Lys Tyr Pro Met 
725 730 735 

Gly Gly He Glu Gly Tyr Cys Gin Lys Leu Trp Thr He Ser Thr He 
740 745 750 

Pro Tyr Leu Tyr Leu Ala Ala Tyr Glu Ser Gly Val Arg He Ala Ser 
755 760 765 

Leu Val Gin Gly Asp Asn Gin Thr He Ala Val Thr Lys Arg Val Pro 
770 775 780 

Ser Thr Trp Pro Tyr Asn Leu Lys Lys Arg Glu Ala Ala Arg Val Thr 
785 790 795 800 

Arg Asp Tyr Phe Val He Leu Arg Gin Arg Leu His Asp He Gly His 
805 810 815 

His Leu Lys Ala Asn Glu Thr He Val Ser Ser His Phe Phe Val Tyr 
820 825 830 
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Ser Lys Gly lie Tyr Tyr Asp Gly Leu Leu Val Ser Gin Ser Leu Lys 
835 840 845 

Ser lie Ala Arg Cys Val Phe Trp Ser Glu Thr lie Val Asp Glu Thr 
850 855 860 

Arg Ala Ala Cys Ser Asn lie Ala Thr Thr Met Ala Lys Ser lie Glu 
865 870 875 880 

Arg Gly Tyr Asp Arg Tyr Leu Ala Tyr Ser Leu Asn Val Leu Lye Val 
885 890 895 

lie Gin Gin lie Leu He Ser Leu Gly Phe Thr He Asn Ser Thr Met 
900 905 910 

Thr Arg Asp Val Val He Pro Leu Leu Thr Asn Asn Asp Leu Leu He 
915 920 925 

Arg Met Ala Leu Leu Pro Ala Pro He Gly Gly Met Asn Tyr Leu Asn 
930 935 940 

Met Ser Arg Leu Phe Val Arg Asn He Gly Asp Pro Val Thr Ser Ser 
945 950 955 960 

He Ala Asp Leu Lys Arg Met He Leu Ser Ser Leu Met Pro Glu Glu 
965 970 975 

Thr Leu His Gin Val Met Thr Gin Gin Pro Gly Asp Ser Ser Phe Leu 
980 985 990 

Asp Trp Ala Ser Asp Pro Tyr Ser Ala Asn Leu Val Cys Val Gin Ser 
995 1000 1005 

He Thr Arg Leu Leu Lys Asn He Thr Ala Arg Phe Val Leu He His 
1010 1015 1020 

Ser Pro Asn Pro Met Leu Lys Gly Leu Phe His Asp Asp Ser Lys Glu 
1025 1030 1035 1040 

Glu Asp Glu Gly Leu Ala Ala Phe Leu Met Asp Arg His He He Val 
1045 1050 1055 

Pro Arg Ala Ala His Glu He Leu Asp His Ser Val Thr Gly Ala Arg 
1060 1065 1070 

Glu Ser He Ala Gly Met Leu Asp Thr Thr Lys Gly Leu He Arg Ala 
1075 1080 1085 

Ser Met Arg Lys Gly Gly Leu Thr Ser Arg Val He Thr Arg Leu Ser 
1090 1095 HOO 

Asn Tyr Asp Tyr Glu Gin Phe Arg Ala Gly Met Val Leu Leu Thr Gly 
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1105 1110 1115 1120 

Arg Lys Arg Asn Val Leu He Asp Lys Glu Ser Cys Ser Val Gin Leu 
1125 1130 1135 

Ala Arg Ala Leu Arg Ser His Met Trp Ala Arg Leu Ala Arg Gly Arg 
1140 1145 1150 

Pro He Tyr Gly Leu Glu Val Pro Asp Val Leu Glu Ser Met Arg Gly 
1155 1160 1165 

His Leu He Arg Arg His Glu Thr Cys Val He Cys Glu Cys Gly Ser 
1170 1175 1180 

Val Asn Tyr Gly Trp Phe Phe Val Pro Ser Gly Cys Gin Leu Asp Asp 
H85 1190 1195 1200 

He Asp Lys Glu Thr Ser Ser Leu Arg Val Pro Tyr He Gly Ser Thr 
1205 1210 1215 

Thr Asp Glu Arg Thr Asp Met Lys Leu Ala Phe Val Arg Ala Pro Ser 
1220 1225 1230 

Arg Ser Leu Arg Ser Ala Val Arg He Ala Thr Val Tyr Ser Trp Ala 
1235 1240 1245 

Tyr Gly Asp Asp Asp Ser Ser Trp Asn Glu Ala Trp Leu Leu Ala Arg 
1250 1255 1260 

Gin Arg Ala Asn Val Ser Leu Glu Glu Leu Arg Val He Thr Pro He 
1265 1270 1275 1280 

Ser Thr Ser Thr Asn Leu Ala His Arg Leu Arg Asp Arg Thr Thr Gin 
1285 1290 1295 

Val Lys Tyr Ser Gly Thr Ser Leu Val Arg Val Ala Arg Tyr Thr Thr 
1300 1305 1310 

He Ser Asn Asp Asn Leu Ser Phe Val He Ser Asp Lys Lys Val Asp 
1315 1320 1325 

Thr Asn Phe He Tyr Gin Gin Gly Met Leu Leu Gly Leu Gly Val Leu 
1330 1335 1340 



Glu Thr Leu Phe Arg Leu Glu Lys 
1345 1350 

Leu His Leu His Val Glu Thr Asp 
1365 

His Pro Arg He Pro Ser Ser Arg 
1380 



Asp Thr Gly Ser Ser Asn Thr Val 
1355 1360 

Cys Cys Val He Pro Met He Asp 
1370 1375 

Lys Leu Glu Leu Arg Ala Glu Leu 
1385 1390 
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Cys Thr Asn Pro Leu He Tyr Asp Asn Ala Pro Leu He Asp Arg Asp 
1395 1400 1405 

Ala Thr Arg Leu Tyr Thr Gin Ser His Arg Arg His Leu Val Glu Phe 
1410 1415 1420 

Val Thr Trp Ser Thr Pro Gin Leu Tyr His He Leu Ala Lys Ser Thr 
1425 1430 1435 144< 

Ala Leu Ser Met He Asp Leu Val Thr Lys Phe Glu Lys Asp His Met 
1445 1450 1455 

Asn Glu He Ser Ala Leu He Gly Asp Asp Asp He Asn Ser Phe He 



Thr Glu Phe Leu Leu He Glu Pro Arg Leu Phe Thr He Tyr Leu Gly 
1475 1480 1485 

Gin Cys Ala Ala He Asn Trp Ala Phe Asp Val His Tyr His Arg Pro 
1490 1495 1500 

Ser Gly Lys Tyr Gin Met Gly Glu Leu Leu Ser Ser Phe Leu Ser Arg 
1505 1510 1515 152( 

Met Ser Lys Gly Val Phe Lys Val Leu Val Asn Ala Leu Ser His Pro 
1525 1530 1535 

Lys He Tyr Lys Lys Phe Trp His Cys Gly He He Glu Pro He His 
1540 1545 1550 

Gly Pro Ser Leu Asp Ala Gin Asn Leu His Thr Thr Val Cys Asn Met 
1555 1560 1565 

He Tyr Thr Cys Tyr Met Thr Tyr Leu Asp Leu Leu Leu Asn Glu Glu 
1570 1575 1580 

Leu Glu Glu Phe Thr Phe Leu Leu Cys Glu Ser Asp Glu Asp Val Val 
1585 1590 1595 160C 

Pro Asp Arg Phe Asp Asn He Gin Ala Lys His Leu Cys Val Leu Ala 
1605 1610 1615 

Asp Leu Tyr Cys Gin Pro Gly Thr Cys Pro Pro He Arg Gly Leu Arg 
1620 1625 1630 

Pro Val Glu Lys Cys Ala Val Leu Thr Asp His He Lys Ala Glu Ala 
1635 1640 1645 

Arg Leu Ser Pro Ala Gly Ser Ser Trp Asn He Asn Pro He He Val 
1650 1655 1660 
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Asp Hie Tyr Ser Cys Ser Leu Thr Tyr Leu Arg Arg Gly Ser He Lys 
1665 1670 1675 1680 

Gin He Arg Leu Arg Val Asp Pro Gly Phe He Phe Asp Ala Leu Ala 
1685 1690 1695 

Glu Val Asn Val Ser Gin Pro Lys He Gly Ser Asn Asn He Ser Asn 
1700 1705 1710 

Met Ser He Lys Asp Phe Arg Pro Pro His Asp Asp Val Ala Lys Leu 
1715 1720 1725 

Leu Lys Asp He Asn Thr Ser Lys His Asn Leu Pro He Ser Gly Gly 
1730 1735 1740 

Asn Leu Ala Asn Tyr Glu He His Ala Phe Arg Arg He Gly Leu Asn 
1745 1750 1755 1760 

Ser Ser Ala Cys Tyr Lys Ala Val Glu He Ser Thr Leu He Arg Arg 
1765 1770 1775 

Cys Leu Glu Pro Gly Glu Asp Gly Leu Phe Leu Gly Glu Gly Ser Gly 
1780 1785 1790 

Ser Met Leu He Thr Tyr Lye Glu lie Leu Lys Leu Asn Lys Cys Phe 
1795 1800 1805 

Tyr Asn Ser Gly Val Ser Ala Asn Ser Arg Ser Gly Gin Arg Glu Leu 
1810 1815 1820 

Ala Pro Tyr Pro Ser Glu Val Gly Leu Val Glu His Arg Met Gly Val 
1825 1830 1835 1840 

Gly Asn He Val Lys Val Leu Phe Asn Gly Arg Pro Glu Val Thr Trp 
1845 1850 1855 

Val Gly Ser Val Asp Cys Phe Asn Phe He Val Ser Asn He Pro Thr 
1860 1865 1870 

Ser Ser Val Gly Phe He His Ser Asp He Glu Thr Leu Pro Asn Lys 
1875 1880 1885 

Asp Thr He Glu Lys Leu Glu Glu Leu Ala Ala He Leu Ser Met Ala 
1890 1895 1900 

Leu Leu Leu Gly Lys He Gly Ser He Leu Val He Lys Leu Met Pro 
1905 1910 1915 1920 

Phe Ser Gly Asp Phe Val Gin Gly Phe He Ser Tyr Val Gly Ser Tyr 
1925 1930 1935 

Tyr Arg Glu Val Asn Leu Val Tyr Pro Arg Tyr Ser Asn Phe He Ser 
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1940 1945 1950 

Thr Glu Ser Tyr Leu Val Met Thr Asp Leu Lys Ala Asn Arg Leu Met 
1955 1960 1965 

Asn Pro Glu Lys lie Lye Gin Gin lie lie Glu Ser Ser Val Arg Thr 
1970 1975 1980 

Ser Pro Gly Leu lie Gly His lie Leu Ser lie Lys Gin Leu Ser Cys 
1985 1990 1995 2000 

lie Gin Ala lie Val Gly Asp Ala Val Ser Arg Gly Gly lie Asn Pro 
2005 2010 2015 

He Leu Lys Lys Leu Thr Pro He Glu Gin Val Leu He Asn Cys Gly 
2020 2025 2030 

Leu Ala He Asn Gly Pro Lys Leu Cys Lys Glu Leu He His His Asp 
2035 2040 2045 

Val Ala Ser Gly Gin Asp Gly Leu Leu Asn Ser He Leu He Leu Tyr 
2050 2055 2060 

Arg Glu Leu Ala Arg Phe Lys Asp Asn Gin Arg Ser Gin Gin Gly Met 
2065 2070 2075 2080 

Phe His Ala Tyr Pro Val Leu Val Ser Ser Arg Gin Arg Glu Leu He 
2085 2090 2095 

Ser Arg He Thr Arg Lys Phe Trp Gly His He Leu Leu Tyr Ser Gly 
2100 2105 2110 

Asn Arg Lys Leu He Asn Arg Phe He Gin Asn Leu Lys Ser Gly Tyr 
2115 2120 2125 

Leu He Leu Asp Leu His Gin Asn He Phe Val Lys Asn Leu Ser Lys 
2130 2135 2140 

Ser Glu Lys Gin He He Met Thr Gly Gly Leu Lys Arg Glu Trp Val 
2145 2150 2155 2160 

Phe Lys Val Thr He Lys Glu Thr Lys Glu Trp Tyr Lys Leu Val Gly 
2165 2170 2175 

Tyr Ser Ala Leu He Lys Asp 
2180 

(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15894 base pairs 

(B) TYPE: nucleic acid 
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(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:7: 
ACCAAACAAA 6TTG6GTAA6 GATAGATCAA TCAATGATCA TATTCTAGTA CACTTAGGAT 
TCAAGATCCT ATT AT CAGGG ACAAGAGCAG GATTAGGGAT ATCCGAGATG GCCACACTTT 
TGAGGAGCTT AGCATTGTTC AAAAGAAACA AGGACAAACC ACCCATTACA TCAGGATCCG 
GTGGAGCCAT CAGAGGAATC AAACACATTA TTATAGTACC AATTCCTGGA GATTCCTCAA 
TTACCACTCG ATCCAGACTA CTGGACCGGT TGGTCAGGTT AATTGGAAAC CCGGATGTGA 
GCGGGCCCAA ACTAACAGGG GCACTAATAG GTATATTATC CTTATTTGTG GAGTCTCCAG 
GTCAATTGAT TCAGAGGATC ACCGATGACC CTGACGTTAG CATCAGGCTG TTAGAGGTTG 
TTCAGAGTGA CCAGTCACAA TCTGGCCTTA CCTTCGCATC AAGAGGTACC AACATGGAGG 
ATGAGGCGGA CCAATACTTT TCACATGATG ATCCAAGCAG TAGTGATCAA TCCAGGTCCG 
GATGGTTCGA GAACAAGGAA ATCTCAGATA TTGAAGTGCA AGATCCTGAG GGATTCAACA 
TGATTCTGGG TACCATTCTA GCCCAGATCT GGGTCTTGCT CGCAAAGGCG GTTACGGCCC 
CAGACACGGC AGCTGATTCG GAGCTAAGAA GGTGGATAAA GTACACCCAA CAAAGAAGGG 
TAGTTGGTGA ATTTAGATTG GAGAGAAAAT GGTTGGATGT GGTGAGGAAC AGGATTGCCG 
AGGACCTCTC TTTACGCCGA TTCATGGTGG CTCTAATCCT GGATATCAAG AGGACACCCG 
GGAACAAACC TAGGATTGCT GAAATGATAT GTGACATTGA TACATATATC GTAGAGGCAG 
GATTAGCCAG TTTTATCTTG ACTATTAAGT TTGGGATAGA AACTATGTAT CCTGCTCTTG 
GACTGCATGA ATTTGCTGGT GAGTTATCCA CACTTGAGTC CTTGATGAAT CTTTACCAGC 
AAATGGGAGA AACTGCACCC TACATGGTAA TCCTAGAGAA CTCAATTCAG AACAAGTTCA 
GCGCAGGATC ATACCCTCTG CTCTGGAGCT ATGCCATGGG AGTAGGAGTG GAACTTGAAA 
ACTCCATGGG AGGTTTGAAC TTTGGTCGAT CTTACTTTGA TCCAGCATAT TTTAGATTAG 
GGCAAGAGAT GGTGAGGAGG TCAGCTGGAA AGGTCAGTTC CACATTGGCA TCCGAACTCG 
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GTATCACTGC CGAGGATGCA AGGCTTGTTT CAGAGATTGC AATGCATACT ACTGAGGACA 
GGATCAGTAG AGCGGTCGGA CCCAGACAAG CCCAAGTATC ATTTCTACAC GGTGATCAAA 
GTGAGAATGA GCTACCAGGA TTGGGGGGCA AGGAAGACAG GAGGGTCAAA CAGAGTCGGG 
GAGAAGCCAG GGAGAGCTAC AGAGAAACCG AGTCCAGCAG AGCAAGTGAT GCGAGAGCTG 
CCCATCCTCC AACCAGCATG CCCCTAGACA TTGACACTGC ATCGGAGTCA GGCCAAGATC 
CGCAGGACAG TCGAAGGTCA GCTGACGCTC TGCTCAGGCT GCAAGCCATG GCAGGAATCT 
TGGAAGAACA AGGCTCAGAC ACGGACACCC CTAGGGTATA CAATGACAGA GATCTTCTAG 
ATTAGGTGCG AGAGGCCGAG GACCAGAACA ACATCCGCCT ACCCTCCATC ATTGTTATAA 
AAAACTTAGG AACCAGGTCC ACACAGCCGC CAGCCAACCA ACCATCCACT CCCACGACTG 
GAGCCGATGG CAGAAGAGCA GGCACGCCAT GTCAAAAACG GACTGGAATG CATCCGGGCT 
CTCAAGGCCG AGCCCATCGG CTCACTGGCC GTCGAGGAAG CCATGGCAGC ATGGTCAGAA 
ATATCAGACA ATCCAGGACA GGACCGAGCC GCCTGCAAGG AAGAGGAGGC AGGCAGTTCG 
GGTCTCAGCA AACCATGCTT CTCAGCAATT GGATCAACTG AAGGCGGTGC ACCTCGCATC 
CGCGGTCAGG GATCTGGAGA AAGCGATGAC GACGCTGAAA CTTTGGGAAT CCCCTCAAGA 
AATCTCCAGG CATCAAGCAC TGGGTTACAG TGTTATCATG TTTATGATCA CAGCGGTGAA 
GCGGTTAAGG GAATCCAAGA TGCTGACTCT ATCATGGTTC AATCAGGCCT TGATGGTGAT 
AGCACCCTCT CAGGAGGAGA CGATGAATCT GAAAACAGCG ATGTGGATAT TGGCGAACCT 
GATACCGAGG GATATGCTAT CACTGACCGG GGATCTGCTC CCATCTCTAT GGGGTTCAGG 
GCTTCTGATG TTGAAACTGC AGAAGGAGGG GAGATCCACG AGCTCCTGAA ACTCCAATCC 
AGAGGCAACA ACTTTCCGAA GCTTGGGAAA ACTCTCAATG TTCCTCCGCC CCCGAACCCC 
AGTAGGGCCA GCACTTCCGA GACACCCATT AAAAAGGGGA CAGACGCGAG ATTGGCCTCA 
TTTGGAACGG AGATCGCGTC TTTATTGACA GGTGGTGCAA CCCAATGTGC TCGAAAGTCA 
CCCTCGGAAC CGTCAGGGCC AGATGCACCT GCGGGGAATG TCCCCGAGTG TGTGAGCAAT 
GCCGCACTGA TACAGGAGTG GACACCCGAA TCTGGTACCA CAATCTCCCC GAGATCCCAG 
AATAATGAAG AAGGGGGAGA CTATTATGAT GATGAGCTGT TCTCCGATGT CCAAGACATC 
AAAACAGCCT TGGCCAAAAT ACACGAGGAT AATCAGAAGA TAATCTCCAA GCTAGAATCA 
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TTGCTGTTAT TGAAGGGA6A AGTTGAGTCA ATTAAGAAGC AGATCAACAG GCAAAATATC 
AGCATATCCA CCCTGGAAGG ACACCTCTCA AGCATCATGA TTGCCATTCC TGGACTTGGG 
AAGGATCCCA ACGACCCCAC TGCAGATGTC GAACTCAATC CCGACCTGAA ACCCATCATA 
GGCAGAGATT CAGGCCGAGC ACTGGCCGAA GTTCTCAAGA AGCCCGTTGC CAGCCGACAA 
CTCCAGGGAA TGACTAATGG ACGGACCAGT TCCAGAGGAC AGCTGCTGAA GGAATTTCAA 
CTAAAGCCGA TCGGGAAAAA GGTGAGCTCA GCCGTCGGGT TTGTCCCTGA CACCGGCCCT 
GCATCACGCA GTGTAATCCG CTCCATTATA AAATCCAGCC GGCTAGAGGA GGATCGGAAG 
CGTTACCTGT TGACTCTCCT TGATGATATC AAAGGAGCCA ACGATCTTGC CAAGTTCCAC 
CAGATGCTGA TGAAGATAAT AATGAAGTAG CTACAGCTCA ACTTACCTGC CAACCCCATG 
CCAGTCGACC TAATTAGTAC AACCTAAATC CATTATAAAA AACTTAGGAG CAAAGTGATT 
GCCTCCTAAG TTCCACAATG ACAGAGATCT ACGACTTCGA CAAGTCGGCA TGGGACATCA 
AAGGGTCGAT CGCTCCGATA CAACCTACCA CCTACAGTGA TGGCAGGCTG GTGCCCCAGG 
TCAGAGTCAT AGATCCTGGT CTAGGTGATA GGAAGGATGA ATGCTTTATG TACATGTTTC 
TGCTGGGGGT TGTTGAGGAC AGAGATCCCC TAGGGCCTCC AATCGGGCGA GCATTCGGGT 
CCCTGCCCTT AGGTGTTGGT AGATCCACAG CAAAACCCGA GGAACTCCTC AAAGAGGCCA 
CTGAGCTTGA CATAGTTGTT AGACGTACAG CAGGGCTCAA TGAAAAACTG GTGTTCTACA 
ACAACACCCC ACTAACCCTC CTCACACCTT GGAGAAAGGT CCTAACAACA GGGAGTGTCT 
TCAATGCAAA CCAAGTGTGC AATGCGGTTA ATCTAATACC GCTGGACACC CCGCAGAGGT 
TCCGTGTTGT TTATATGAGC ATCACCCGTC TTTCGGATAA CGGGTATTAC ACCGTTCCCA 
GAAGAATGCT GGAATTCAGA TCGGTCAATG CAGTGGCCTT CAACCTGCTA GTGACCCTCA 
GGATTGACAA GGCGATTGGC CCTGGGAAGA TCATCGACAA TGCAGAGCAA CTTCCTGAGG 
CAACATTTAT GGTCCACATC GGGAACTTCA GGAGAAAGAA GAGTGAAGTC TACTCTGCCG 
ATTATTGCAA AATGAAAATC GAAAAGATGG GCCTGGTTTT TGCACTTGGT GGGATAGGGG 
GCACCAGTCT TCACATTAGA AGCACAGGCA AAATGAGCAA GACTCTCCAT GCACAACTCG 
GGTTCAAGAA GACCTTATGT TACCCACTGA TGGATATCAA TGAAGACCTT AATCGGTTAC 
TCTGGAGGAG CAGATGCAAG ATAGTAAGAA TCCAGGCAGT TTTGCAGCCA TCAGTTCCTC 
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AAGAATTCCG CATTTACGAC GACGTGATCA TAAATGATGA CCAAGGACTA TTCAAAGTTC 
TGTAGACCGT AGTGCCCAGC AATACCCGAA AACGACCCCC CTCATAATGA CAGCCAGAAG 
GCCCGGACAA AAAAGCCCCC TCCAAAAGAC TTCACGGACC AAGCGAGAGG CCAGCCAGCA 
GCCGACAGCA AGTGTGGACA CCAGGCGGCC CAAGCACAGA ACAGCCCCGA CACAAGGCCA 
CCACCAGCCA TCCCAATCCG CGTCCTCCTC GTAGGACCCC CGAGGACCAA CCCCCAAGGT 
CGCTCCGGAC ACAGACCACC AGCCGCATCC CCACAGCCCT CGGGAAAGGA ACCCCCAGCA 
ACTGGAAGGC CCCTTCCCCC CTCCCCCAAC GCAAGAACCC CACAACCGAA CCGCACAAGC 
GACCGAGGTG ACCCAACCGC AGGCATCCGA CTCCCTAGAC AGACCCTCCC TCCCCGGCAT 
ACTAAACAAA ACTTAGGGCC AAGGAACACA CACACCCGAC AGAACCCAGA CCCCGGCCCG 
CGGCACCGCG CCCCCACCCC CCGAAAACCA GAGGGAGCCC CCAACCAATC CCGCCGCCCC 
CCCCGGTGCC CACAGGTAGG CACACCAACC CCCGAACAGA CCCAGCACCC AGCCACCGAC 
AATCCAAGAC GGGGGGCCCC CCCCAAAAAA AGGCCCCCAG GGGCCGACAG CCAGCATCGC 
GAGGAAGCCC ACCCACCCCA CACACGACCA CGGCAACCAA ACCAGAGCCC AGACCACCCT 
GGGCCACCAG CTCCCAGACT CGGCCATCAC CCCGAAAAAA GGAAAGGCCA CAACCCGCGC 
ACCCCAGGCC CGATCCGGCG GGAAGCCACC CAACCCGAAC CAGCACCCAA GAGCGATCCC 
TGGGGGACCC CCAAACCGCA AAAGACATCA GTATCCCACC GCCTCTCCAA GTCCCCCGGT 
CTCCTCCTCT TCTCGAAGGG ACCAAAAGAT CAATCCACCA CATCCGACGA CACTCAATTC 
CCCACCCCTA AAGGAGACAC CGGGAATCCC AGAATCAAGA CTCAT CCAAT GTCCATCATG 
GGTCTCAAGG TGAATGTCTT TGCCATATTC ATGGCAGTAC TGTTAACTCT CCAAACACCC 
ACCGGTCAAA TCCATTGGGG CAATCTCTCT AAGATAGGGG TGGTAGGGAT AGGAAGTGCA 
AGCTACAAAG TTATGACTCG TTCCAGCCAT CAATCATTGG TCATAAAATT AATGCCCAAT 
ATAACTCTCC TCAATAACTG CACGAGGGTA GAAATTGCAG AATACAGGAG ACTACTGAGA 
ACAGTTTTGG AACCAATTAG AGATGCACTT AATGCAATGA CCCAGAATAT AAGACCGGTT 
CAGAGTGTAG CTTCAAGTAG GAGACACAAG AGATTTGCGG GAGTTGTCCT GGCAGGTGCG 
GCCCTAGGCG TTGCCACAGC TGCTCAGATA ACAGCCGGCA TTGCACTTCA CCAGTCCATG 
CTGAACTCTC AAGCCATCGA CAATCTGAGA GCAAGCCTGG AAACTACTAA TCAGGCAATT 
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6A66CAATCA GGCAAGCAGG GCAGGAGATG ATATTGGCTG TTCAGGGTGT CCAAGACTAC 
ATCAATAATG AGCTGATACC GTCTATGAAC CAACTATCTT GTGATTTAAT CGGCCAGAAG 
CTAGGGCTCA AATTGCTCAG ATACTATACA GAAATCCTGT CATTATTTGG CCCCAGCTTA 
CGGGACCCCA TATCTGCGGA GATATCCATC CAGGCTTTGA GCTATGCGCT TGGGGGAGAT 
ATCAATAAGG TATTAGAAAA GCTCGGATAC AGTGGAGGTG ATTTACTGGG CATCTTAGAG 
AGCAGAGGAA TAAAGGCCCG GATAACTCAC GTCGACACAG AGTCCTACTT CATTGTCCTC 
AGTATAGCCT ATCCGACGCT GTCCGAGATT AAGGGGGTGA TTGTCCACCG GCTAGAGGGG 
GTCTCGTACA ATATAGGCTC TCAAGAGTGG TATACCACTG TGCCCAAGTA TGTTGCAACC 
CAAGGGTACC TTATCTCGAA TTTTGATGAG TCATCGTGTA CTTTCATGCC AGAGGGGACT 
GTGTGCAGCC AAAATGCCTT GTACCCGATG AGTCCTCTGC TCCAAGAATG CCTCCGGGGG 
TCCACCAAGT CCTGTGCTCG TACACTCGTA TCCGGGTCTT TTGGGAACCG GTTCATTTTA 
TCACAAGGGA ACCTAATAGC CAATTGTGCA TCAATCCTCT GCAAGTGTTA CACAACAGGA 
ACGATCATTA ATCAAGACCC TGACAAGATC CTAACATACA TTGCTGCCGA TCACTGCCCG 
GTGGTCGAGG TGAACGGTGT GACCATCCAA GTCGGGAGCA GGAGGTATCC GGACGCGGTG 
TACCTGCACA GAATTGACCT CGGTCCTCCC ATATCATTGG AGAAGTTGGA CGTAGGGACA 
AATCTGGGGA ATGCAATTGC TAAGCTGGAG GATGCCAAGG AATTGCTGGA GTCATCGGAC 
CAGATATTGA GGAGTATGAA AGGTTTATCG AGCACTAGCA TAGTTTACAT CCTGATTGCA 
GTGTGTCTTG GAGGGTTGAT AGGGATCCCC GCTTTAATAT GTTGCTGCAG GGGGCGTTGT 
AACAAAAAGG GGGAACAAGT TGGTATGTCA AGACCAGGCC TAAAGCCTGA TCTTACAGGG 
ACATCAAAAT CCTATGTAAG GTCGCTCTGA TCCCCTACAA CTCTTGAAAC ACAGATTTCC 
CACAAGTCTC CTCTCCGTCA TCAAGCAACC ACCGCATCCA GCATCAAGGC CACCCGAAAT 
TGTCTCCGGC TTCCCTCTGG CCGAACGATA TCGGTAGTTA ATTAAAACTT AGGGTGCAAG 
ATCATCCACA ATGTCACCAC ACCGAGACCG AATAAATGCC TTCTACAAAG ACAACCCCCA 
TCCTAAGGGA AGTAGGATAG TTATTAACAG AGAACATCTT ATGATTGATA GACCTTATGT 
TTTGCTGGCT GTTCTATTCG TCATGTTTCT GAGCTTGATC GGGTTGCTAG CCATTGCAGG 
CATTAGACTC CATCGGGCAG CCATCTACAC CGCAGAGATC CATAAGAGCC TCAGCACCAA 
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TCTAGATGTA ACTAACTCAA TCGAGCATCA GGTCAAGGAC GTGCTGACAC CACTCTTCAA 
GATCATCGGT GATGAAGTGG GCCTGAGGAC ACCTCAGAGA TTCACTGACC TAGTGAAATT 



CATCTCTGAC AAAATTAAAT TCCTTAATCC GGATAGGGAG TACGACTTCA GAGATCTCAC 
TTGGTGTATC AACCCGCCAG AGAGAATCAA ATTGGATTAT GATCAATACT GTGCAGATGT 
GGCTGCTGAA GAACTCATGA ATGCATTGGT GAACTCAACT CTACTGGAGG CCAGGGCAAC 
CAATCAGTTC CTAGCTGTCT CAAAGGGAAA CTGCTCAGGG CCCACTACAA TCAGAGGTCA 
ATTCTCAAAC ATGTCGCTGT CCCTGTTGGA CTTGTATTTA AGTCGAGGTT ACAATGTGTC 
ATCTATAGTC ACCATGACAT CCCAGGGAAT GTACGGGGGA ACTTACCTAG TGGGAAAGCC 
TAATCTGAGC AGTAAAGGGT CAGAGTTGTC ACAACTGAGC ATGCACCGAG TGTTTGAAGT 
AGGGGTTATC AGAAATCCGG GTTTGGGGGC TCCGGTGTTC CATATGACAA ACTATTTTGA 
GCAACCAGTC AGTAATGATT TCAGCAACTG CATGGTGGCT TTGGGGGAGC TCAGGTTCGC 
AGCCCTCTGT CACAGGGAAG ATTCTGTCAC GGTTCCCTAT CAGGGGTCAG GGAAAGGTGT 
CAGCTTCCAG CTCGTCAAGC TAGGTGTCTG GAAATCCCCA ACCGACATGC AATCCTGGGT 
CCCCCTATCA ACGGATGATC CAGTGATAGA TAGGCTTTAC CTCTCATCTC ACAGAGGTGT 
TATCGCTGAC AATCAAGCAA AATGGGCTGT CCCGACAACA CGGACAGATG ACAAGTTGCG 
AATGGAGACA TGCTTCCAGC AGGCGTGTAA GGGTAAAAAC CAAGCACTCT GCGAGAATCC 
CGAGTGGGCA CCATTGAAGG ATAACAGGAT TCCTTCATAC GGGGTCTTGT CTGTTAATCT 
GAGTCTGACA GTTGAGCTTA AAATCAAAAT TGCTTCAGGA TTCGGGCCAT TGATCACACA 
CGGTTCAGGG ATGGACCTAT ACAAAACCAA CCACAACAAT GTGTATTGGC TGACTATCCC 
GCCAATGAAG AACCTAGCCT TAGGTGTAAT CAACACATTG GAGTGGATAC CGAGATTCAA 
GGTTAGTCCC AACCTCTTCA CTGTTCCAAT CAAGGAAGCA GGCGAGGACT GCCATGCCCC 
AACATACCTA CCTGCGGAGG TGGATGGTGA TGTCAAACTC AGTTCCAATC TGGTAATTCT 
ACCTGGTCAG GATCTCCAAT ATGTTTTGGC AACCTACGAT ACTTCCAGGG TTGAACATGC 
TGTGGTTTAT TATGTTTACA GCCCAGGCCG CTCATTTTCT TACTTTTATC CTTTTAGGTT 
GCCTATAAAG GGGGTCCCAA TCGAATTACA AGTGGAATGC TTCACATGGG ACCAAAAACT 
CTGGTGCCGT CACTTCT GTG TGCTTGCGGA TTCAGAATCT GGTGGACATA TCACTCACTC 
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TGGGATGGTG GGCATGGGAG TCAGCTGCAC AGTCACTCGG GAAGATGGAA CCAATCGCAG 
ATAGGGCTGC CAGTGAACCG ATCACATGAT GTCACTCAGA CACCAGGCAT ACCCACTAGT 
GTGAAATAGA CATCAGAATT AAGAAAAACG TAGGGTCCAA GTGGTTTCCC GTCATGGACT 
CGCTATCTGT CAACCAGATC TTGTACCCTG AAGTTCACCT AGATAGCCCG ATAGTTACCA 
ATAAGATAGT AGCTATCCTG GAGTATGCTC GAGTCCCTCA CGCTTACAGC CTTGAGGACC 
CTACACTGTG TCAGAACATC AAGCACCGCC TAAAAAACGG ATTCTCCAAC CAAATGATTA 
TAAACAATGT GGAAGTTGGG AATGTCATCA AGTCCAAGCT TAGGAGTTAT CCGGCCCACT 
CTCATATTCC ATATCCAAAT TGTAATCAGG ATTTATTTAA CATAGAAGAC AAAGAGTCAA 
CAAGGAAGAT CCGTGAGCTC CTAAAAAAGG GAAATTCGCT GTACTCCAAA GTCAGTGATA 
AGGTTTTCCA ATGCCTGAGG GACACTAACT CACGGCTTGG CCTAGGCTCC GAATTGAGGG 
AGGACATCAA GGAGAAAATT ATTAACTTGG GAGTTTACAT GCACAGCTCC CAATGGTTTG 
AGCCCTTTCT GTTTTGGTTT ACAGTCAAGA CTGAGATGAG GTCAGTGATT AAATCACAAA 
CCCATACTTG CCATAGGAGG AGACACACAC CTGTATTCTT CACTGGTAGT TCAGTTGAGC 
TGTTAATCTC TCGTGACCTT GTTGCTATAA TCAGTAAGGA GTCTCAACAT GTATATTACC 
TGACGTTTGA ACTGGTTTTG ATGTATTGTG ATGTCATAGA GGGGAGGTTA ATGACAGAGA 
CCGCTATGAC CATTGATGCT AGGTATGCAG AACTTCTAGG AAGAGTCAGA TACATGTGGA 
AACTGATAGA TGGTTTCTTC CCTGCACTCG GGAATCCAAC TTATCAAATT GTAGCTATGC 
TGGAGCCACT TTCACTTGCT TACCTGCAAC TGAGGGACAT AACAGTAGAA CTCAGAGGTG 
CTTTCCTTAA CCACTGCTTT ACTGAAATAC ATGATGTTCT TGACCAAAAC GGGTTTTCTG 
ATGAAGGTAC TTATCATGAG TTAATTGAAG CCTTAGATTA CATTTTCATA ACTGATGACA 
TACATCTGAC AGGGGAGATT TTCTCATTTT TCAGAAGTTT CGGCCACCCC AGACTTGAAG 
CAGTAACGGC TGCTGAAAAT GTCAGGAAAT ACATGAATCA GCCTAAAGTC ATTGTGTATG 
AGACTCTGAT GAAGGGTCAT GCCATATTTT GTGGAATCAT AATCAACGGC TATCGTGACA 
GGCACGGAGG CAGTTGGCCA CCCCTGACCC TCCCCCTGCA TGCTGCAGAC ACAATCCGGA 
ATGCTCAAGC TTCAGGTGAA GGGTTAACAC ATGAGCAGTG CGTTGATAAC TGGAGATCAT 
TTGCTGGAGT GAGATTTGGC TGTTTTATGC CTCTTAGCCT GGACAGTGAT CTGACAATGT 
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ACCTAAAGGA CAAGGCACTT GCTGCTCTCC AAAGGGAATG GGATT CAGTT TACCCGAAAG 
AGTTCCTGCG TTACGATCCT CCCAAGGGAA CCGGGTCACG GAGGCTTGTA GATGTTTTCC 
TTAATGATTC GAGCTTTGAC CCATATGATA TGATAATGTA TGTCGTAAGT GGAGCCTACC 
TCCATGACCC TGAGTTCAAT CTGTCTTACA GCCTGAAAGA AAAGGAGATC AAGGAAACAG 
GTAGACTTTT CGCTAAAATG ACTTACAAAA TGAGGGCATG CCAAGTGATC GCTGAAAATC 
TAATCTCAAA CGGGATTGGC AAGTATTTTA AGGACAATGG GATGGCCAAG GATGAGCACG 
ATTTGACTAA GGCACTCCAC ACTCTGGCTG TCTCAGGAGT CCCCAAAGAT CTCAAAGAAA 
GTCACAGGGG GGGGCCAGTC TTAAAAACCT ACTCCCGAAG CCCAGTCCAC ACAAGTACCA 
GGAACGTTAA AGCAGAAAAA GGGTTTGTAG GATTCCCTCA TGTAATTCGG CAGAATCAAG 
ACACTGATCA TCCGGAGAAT ATAGAAACCT ACGAGACAGT CAGCGCATTT ATCACGACTG 
ATCTCAAGAA GTACTGCCTT AATTGGAGAT ATGAGACCAT CAGCTTATTT GCACAGAGGC 
TAAATGAGAT TTACGGATTA CCCTCATTTT TTCAGTGGCT GCATAAGAGG CTTGAAACCT 
CTGTCCTCTA TGTAAGTGAT CCTCATTGCC CCCCCGACCT TGACGCCCAT GTCCCGTTAT 
GCAAAGTCCC CAATGACCAA ATCTTCATCA AGTACCCTAT GGGAGGTATA GAAGGGTATT 
GTCAGAAGCT GTGGACCATC AGCACCATTC CCTACTTATA CCTGGCTGCT TATGAGAGCG 
GGGTAAGGAT TGCCTCGTTA GTGCAAGGGG ACAATCAGAC CATAGCCGTA ACAAAAAGGG 
TACCCAGCAC ATGGCCTTAC AACCTTAAGA AACGGGAAGC TGCTAGAGTA ACTAGAGATT 
ACTTTGTAAT TCTTAGGCAA AGGCTACATG ACATTGGCCA TCACCTCAAG GCAAATGAGA 
CAATTGTTTC ATCACATTTT TTTGTCTATT CAAAAGGAAT ATATTATGAT GGGCTACTTG 
TGTCCCAATC ACTCAAGAGC ATTGCAAGAT GTGTATTCTG GTCAGAGACT ATAGTTGATG 
AAACAAGGGC AGCATGCAGT AATATTGCTA CAACAATGGC TAAAAGCATC GAGAGAGGTT 
ATGACCGTTA TCTTGCATAT TCCCTGAACG TCCTAAAAGT GATACAGCAA ATTTTGATCT 
CTCTTGGCTT CACAATCAAT TCAACCATGA CCCGAGATGT AGTCATACCC CTCCTCACAA 
ACAACGATCT CTTAATAAGG ATGGCACTGT TGCCCGCTCC TATTGGGGGG ATGAATTATC 
TGAACATGAG CAGGCTGTTT GTCAGAAACA TCGGTGATCC AGTAACATCA TCAATTGCTG 
ATCTCAAGAG AATGATTCTC GCATCACTAA TGCCTGAAGA GACCCTCCAT CAAGTAATGA 
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CACAACAACC 66G6GACTCT TCATTCCTAG ACTGGGCTAO CGACCCTTAC TCAGCAAATC 
TTGTATGCGT CCAGAGCATC ACTAGACTCC TCAAGAACAT AACTGCAAGG TTTGTCCTAA 
TCCATAGTCC AAACCCAATG TTAAAAGGGT TATTCCATGA TGACAGTAAA GAAGAGGACG 
AGAGACTGGC GGCATTCCTC ATGGACAGGC ATATTATAGT ACCTAGGGCA GCTCATGAAA 
TCCTGGATCA TAGTGTCACA GGGGCAAGAG AGTCTATTGC AGGCATGCTA GATACCACAA 
AAGGCCTGAT TCGAGCCAGC ATGAGGAAGG GGGGGTTAAC CTCTCGAGTG ATAACCAGAT 
TGTCCAATTA TGACTATGAA CAATTTAGAG CAGGGATGGT GCTATTGACA GGAAGAAAGA 
GAAATGTCCT CATTGACAAA GAGTCATGTT CAGTGCAGCT GGCTAGAGCC CTAAGAAGCC 
ATATGTGGGC AAGACTAGCT CGAGGACGGC CTATTTACGG CCTTGAGGTC CCTGATGTAC 
TAGAATCTAT GCGAGGCCAC CTTATTCGGC GTCATGAGAC ATGTGTCATC TGCGAGTGTG 
GATCAGTCAA CTACGGATGG TTTTTTGTCC CCTCGGGTTG CCAACTGGAT GATATTGACA 
AGGAAACATC ATCCTTGAGA GTCCCATATA TTGGTTCTAC CACTGATGAG AGAACAGACA 
TGAAGCTTGC CTTCGTAAGA GCCCCAAGTA GATCCTTGCG ATCTGCCGTT AGAATAGCAA 
CAGTGTACTC ATGGGCTTAC GGTGATGATG ATAGCTCTTG GAACGAAGCC TGGTTGTTGG 
CAAGGCAAAG GGCCAATGTG AGCCTGGAGG AGCTAAGGGT GATCACTCCC ATCTCGACTT 
CGACTAATTT AGCGCATAGG TTGAGGGATC GTAGCACTCA AGTGAAATAC TCAGGTACAT 
CCCTTGTCAG AGTGGCAAGG TATACCACAA TCTCCAACGA CAATCTCTCA TTTGTCATAT 
CAGATAAGAA AGTTGATACT AACTTTATAT ACCAACAAGG AATGCTTCTA GGGTTGGGTG 
TTTTAGAAAC ATTGTTTCGA CTCGAGAAAG ATACTGGATC ATCTAACACG GTATTACATC 
TTCACGTCGA AACAGATTGT TGCGTGATCC CGATGATAGA TCATCCCAGG ATACCCAGCT 
CCCGCAAGCT AGAGCTGAGG GCAGAGCTAT GTACCAACCC ATTGATATAT GATAATGCAC 
CTTTAATTGA CAGAGATGCA ACAAGGCTAT ACACCCAGAG CCATAGGAGG CACCTTGTGG 
AATTTGTTAC ATGGTCCACA CCCCAACTAT ATCACATTCT AGCTAAGTCC ACAGCACTAT 
CTATGATTGA CCTGGTAACA AAATTTGAGA AGGACCATAT GAATGAAATT TCAGCTCTCA 
TAGGGGATGA CGATATCAAT AGTTTCATAA CTGAGTTTCT GCTTATAGAG CCAAGATTAT 
TCACCATCTA CTTGGGCCAG TGTGCAGCCA TCAATTGGGC ATTTGATGTA CATTATCATA 
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GACCATCAGG GAAATATCAG ATGGGTGAGC TGTTGTCTTC GTTCCTTTCT AGAATGAGCA 
AAGGAGTGTT TAAGGTGCTT GTCAATGCTC TAAGCCACCC AAAGATCTAC AAGAAATTCT 
GGCATTGTGG TATTATAGAG CCTATCCATG GTCCTTCACT TGATGCTCAA AACTTGCACA 
CAACTGTGTG CAACATGGTT TACACATGCT ATATGACCTA CCTTGACCTG TTGTTGAATG 
AAGAGTTAGA AGAGTTCACA TTTCTTTTGT GTGAAAGCGA TGAGGATGTA GTACCGGACA 
GATTCGACAA CATCCAGGCA AAACACTTGT GTGTTCTGGC AGATTTGTAC TGTCAACCAG 
GGACCTGCCC ACCGATTCGA GGTCTAAGGC CGGTAGAGAA ATGTGCAGTT CTAACCGATC 
ATATCAAGGC AGAGGCTAGG TTATCTCCAG CAGGATCTTC GTGGAACATA AATCCAATTA 
TTGTAGACCA TTACTCATGC TCTCTGACTT ATCTCCGTCG AGGATCTATC AAACAGATAA 
GATTGAGAGT TGATCCAGGA TTCATTTTTG ATGCCCTCGC TGAGGTAAAT GTCAGTCAGC 
CAAAGGTCGG CAGCAACAAC ATCTCAAATA TGAGCATCAA GGATTTCAGA CCTCCACACG 
ATGATGTTGC AAAATTGCTC AAAGATATCA ACACAAGCAA GCACAATCTT CCCATTTCAG 
GGGGTAGTCT TGCCAATTAT GAAATCCATG CTTTCCGCAG AATCGGGTTA AACTCATCTG 
CTTGCTACAA AGCTGTTGAG ATATCAACAT TAATTAGGAG ATGCCTTGAG CCAGGGGAAG 
ACGGCTTGTT CTTGGGTGAG GGGTCGGGTT CTATGTTGAT CACTTATAAG GAGATACTAA 
AACTAAACAA GTGCTTCTAT AATAGTGGGG TTTCCGCCAA TTCTAGATCT GGTCAAAGGG 
AATTAGCACC CTATCCCTCC GAAGTTGGCC TTGTCGAACA CAGAATGGGA GTAGGTAATA 
TTGTCAAGGT GCTCTTTAAC GGGAGGCCCG AAGTCACGTG GGTAGGCAGT ATAGATTGCT 
TCAATTTCAT AGTCAGTAAT ATCCCTACCT CTAGTGTGGG ATTTATCCAT TCAGATATAG 
AGACCTTACC CAACAAAGAT ACTATAGAGA AGTTAGAGGA ATTGGCAGCC ATCTTATCGA 
TGGCTCTACT CCTTGGCAAA ATAGGATCAA TACTGGTGAT TAAGCTTATG CCTTTCAGCG 
GGGATTTTGT TCAGGGATTT ATAAGCTATG TAGGGTCTCA TTATAGAGAA GTGAACCTTG 
TCTACCCTAG GTACAGCAAC TTCATATCTA CTGAATCTTA TTTAGTTATG ACAGATCTCA 
AAGCTAACCG GCTAATGAAT CCTGAAAAGA TTAAGCAGCA GATAATTGAA TCATCTGTGC 
GGACTTCACC TGGACTTATA GGTCACATCC TATCTATCAA GCAACTAAGC TGCATACAAG 
CAATTGTGGG AGGCGCAGTT AGTAGAGGTG ATATCAACCC TATTCTGAAA AAACTTACAC 
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CTATAGAGCA GGTGCTGATC AGTTGCGGGT TGGCAATTAA CGGACCTAAG CTGTGCAAAG 15360 

AATTAATCCA CCATGATGTT GCCTCAGGGC AAGATGGATT GCTTAACTCT ATACTCATCC 15420 

TCTACAGGGA GTTGGCAAGA TTCAAAGACA ACCAAAGAAG TCAACAAGGG ATGTTCCACG 15480 

CTTACCCCGT ATTGGTAAGT AGTAGGCAAC GAGAACTTGT ATCTAGGATC ACTCGCAAAT 15540 

TTTGGGGGCA TATTCTTCTT TACTCCGGGA ACAGAAAGTT GATAAATCGG TTTATCCAGA 15600 

ATCTCAAGTC CGGTTATCTA ATACTAGACT TACACCAGAA TATCTTCGTT AAGAATCTAT 15660 

CCAAGTCAGA GAAACAGATT ATTATGACGG GGGGTTTAAA ACGTGAGTGG GTTTTTAAGG 15720 

TAACAGTCAA GGAGACCAAA GAATGGTATA AGTTAGTCGG ATACAGCGCT CTGATTAAGG 15780 

ATTAATTGGT TGAACTCCGG AACCCTAATC CTACCCTAGG TAGTTAGGCA TTATTTGCAA 15840 

TATATTAAAG AAAACTTTGA AAATACGAAG TTTCTATTCC CAGCTTTGTC TGGT 15894 
(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2183 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 

Met Asp Ser Leu Ser Val Asn Gin lie Leu Tyr Pro Glu Val His Leu 
15 10 15 

Asp Ser Pro He Val Thr Asn Lys He Val Ala He Leu Glu Tyr Ala 
20 25 30 

Arg Val Pro His Ala Tyr Ser Leu Glu Asp Pro Thr Leu Cys Gin Asn 
35 40 45 

He Lys His Arg Leu Lys Asn Gly Phe Ser Asn Gin Met He He Asn 
50 55 60 

Asn Val Glu Val Gly Asn Val He Lys Ser Lys Leu Arg Ser Tyr Pro 
65 70 75 80 

Ala His Ser His He Pro Tyr Pro Asn Cys Asn Gin Asp Leu Phe Asn 
85 90 95 
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He Glu Asp Lys Glu Ser Thr Arg Lys He Arg Glu Leu Leu Lys Lys 
100 105 110 

Gly Asn Ser Leu Tyr Ser Lys Val Ser Asp Lys Val Phe Gin Cys Leu 
115 120 125 

Arg Asp Thr Asn Ser Arg Leu Gly Leu Gly Ser Glu Leu Arg Glu Asp 
130 135 140 

He Lys Glu Lys He He Asn Leu Gly Val Tyr Met His Ser Ser Gin 
145 150 155 160 

Trp Phe Glu Pro Phe Leu Phe Trp Phe Thr Val Lys Thr Glu Met Arg 
165 170 175 

Ser Val He Lys Ser Gin Thr His Thr Cys His Arg Arg Arg His Thr 
180 185 190 

Pro Val Phe Phe Thr Gly Ser Ser Val Glu Leu Leu He Ser Arg Asp 
195 200 205 

Leu Val Ala He He Ser Lys Glu Ser Gin His Val Tyr Tyr Leu Thr 
210 215 220 

Phe Glu Leu Val Leu Met Tyr Cys Asp Val He Glu Gly Arg Leu Met 
225 230 235 240 

Thr Glu Thr Ala Met Thr He Asp Ala Arg Tyr Ala Glu Leu Leu Gly 
245 250 255 

Arg Val Arg Tyr Met Trp Lys Leu He Asp Gly Phe Phe Pro Ala Leu 
260 265 270 

Gly Asn Pro Thr Tyr Gin He Val Ala Met Leu Glu Pro Leu Ser Leu 
275 280 285 

Ala Tyr Leu Gin Leu Arg Asp He Thr Val Glu Leu Arg Gly Ala Phe 
290 295 300 

Leu Asn His Cys Phe Thr Glu He His Asp Val Leu Asp Gin Asn Gly 
305 310 315 320 

Phe Ser Asp Glu Gly Thr Tyr His Glu Leu He Glu Ala Leu Asp Tyr 
325 330 335 

He Phe He Thr Asp Asp He His Leu Thr Gly Glu He Phe Ser Phe 
340 345 350 

Phe Arg Ser Phe Gly His Pro Arg Leu Glu Ala Val Thr Ala Ala Glu 
355 360 365 
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Asn Val Arg Lys Tyr Met Asn Gin Pro Lys Val lie Val Tyr Glu Thr 
370 375 380 

Leu Met Lys Gly His Ala He Phe Cys Gly He He He Asn Gly Tyr 
385 390 395 400 

Arg Asp Arg His Gly Gly Ser Trp Pro Pro Leu Thr Leu Pro Leu His 
405 410 415 

Ala Ala Asp Thr He Arg Asn Ala Gin Ala Ser Gly Glu Gly Leu Thr 
420 425 430 

His Glu Gin Cys Val Asp Asn Trp Arg Ser Phe Ala Gly Val Arg Phe 
435 440 445 

Gly Cys Phe Met Pro Leu Ser Leu Asp Ser Asp Leu Thr Met Tyr Leu 
450 455 * 460 

Lys Asp Lys Ala Leu Ala Ala Leu Gin Arg Glu Trp Asp Ser Val Tyr 
465 470 475 480 

Pro Lys Glu Phe Leu Arg Tyr Asp Pro Pro Lys Gly Thr Gly Ser Arg 



Arg Leu Val Asp Val Phe Leu Asn Asp Ser Ser Phe Asp Pro Tyr Asp 
500 505 510 

Met He Met Tyr Val Val Ser Gly Ala Tyr Leu His Asp Pro Glu Phe 
515 520 525 

Asn Leu Ser Tyr Ser Leu Lys Glu Lys Glu He Lys Glu Thr Gly Arg 
530 535 540 

Leu Phe Ala Lys Met Thr Tyr Lys Met Arg Ala Cys Gin Val He Ala 
545 550 555 560 

Glu Asn Leu He Ser Asn Gly He Gly Lys Tyr Phe Lys Asp Asn Gly 
565 570 575 

Met Ala Lys Asp Glu His Asp Leu Thr Lys Ala Leu His Thr Leu Ala 
580 585 590 

Val Ser Gly Val Pro Lys Asp Leu Lys Glu Ser His Arg Gly Gly Pro 
595 600 605 

Val Leu Lys Thr Tyr Ser Arg Ser Pro Val His Thr Ser Thr Arg Asn 
610 615 620 

Val Lys Ala Glu Lys Gly Phe Val Gly Phe Pro His Val He Arg Gin 
625 630 635 640 

Asn Gin Asp Thr Asp His Pro Glu Asn He Glu Thr Tyr Glu Thr Val 
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495 
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645 650 655 

Ser Ala Phe lie Thr Thr Asp Leu Lys Lys Tyr Cys Leu Asn Trp Arg 
660 665 670 

Tyr Glu Thr He Ser Leu Phe Ala Gin Arg Leu Asn Glu He Tyr Gly 
675 680 685 

Leu Pro Ser Phe Phe Gin Trp Leu His Lys Arg Leu Glu Thr Ser Val 
690 695 700 

Leu Tyr Val Ser Asp Pro His Cys Pro Pro Asp Leu Asp Ala His Val 
705 710 715 720 

Pro Leu Cys Lys Val Pro Asn Asp Gin Ho Phe He Lys Tyr Pro Met 
725 730 735 

Gly Gly He Glu Gly Tyr Cys Gin Lys Leu Trp Thr He Ser Thr He 
740 745 750 

Pro Tyr Leu Tyr Leu Ala Ala Tyr Glu Ser Gly Val Arg He Ala Ser 
755 760 765 

Leu Val Gin Gly Asp Asn Gin Thr He Ala Val Thr Lys Arg Val Pro 
770 775 780 

Ser Thr Trp Pro Tyr Asn Leu Lys Lys Arg Glu Ala Ala Arg Val Thr 
785 790 795 800 

Arg Asp Tyr Phe Val He Leu Arg Gin Arg Leu His Asp He Gly His 
805 810 815 

His Leu Lys Ala Asn Glu Thr He Val Ser Ser His Phe Phe Val Tyr 
820 825 830 

Ser Lys Gly He Tyr Tyr Asp Gly Leu Leu Val Ser Gin Ser Leu Lys 
835 840 845 

Ser He Ala Arg Cys Val Phe Trp Ser Glu Thr He Val Asp Glu Thr 
850 855 860 

Arg Ala Ala Cys Ser Asn He Ala Thr Thr Met Ala Lys Ser He Glu 
865 870 875 880 

Arg Gly Tyr Asp Arg Tyr Leu Ala Tyr Ser Leu Asn Val Leu Lys Val 
885 890 895 

He Gin Gin He Leu He Ser Leu Gly Phe Thr He Asn Ser Thr Met 
900 905 910 

Thr Arg Asp Val Val He Pro Leu Leu Thr Asn Asn Asp Leu Leu He 
915 920 925 
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Arg Met Ala Leu Leu Pro Ala Pro 
930 935 

Met Ser Arg Leu Phe Val Arg Asn 
945 950 

lie Ala Asp Leu Lys Arg Met lie 
965 

Thr Leu His Gin Val Met Thr Gin 
980 



lie Gly Gly Met Asn Tyr Leu Asn 
940 

lie Gly Asp Pro Val Thr Ser Ser 
955 960 

Leu Ala Ser Leu Met Pro Glu Glu 
970 975 

Gin Pro Gly Asp Ser Ser Phe Leu 
985 990 



Asp Trp Ala Ser Asp Pro Tyr Ser Ala Asn Leu Val Cys Val Gin Ser 
995 1000 1005 

lie Thr Arg Leu Leu Lys Asn lie Thr Ala Arg Phe Val Leu lie His 
1010 1015 1020 

Ser Pro Asn Pro Met Leu Lys Gly Leu Phe His Asp Asp Ser Lys Glu 
1025 1030 1035 1040 

Glu Asp Glu Arg Leu Ala Ala Phe Leu Met Asp Arg His He lie Val 
1045 1050 1055 

Pro Arg Ala Ala His Glu lie Leu Asp His Ser Val Thr Gly Ala Arg 
1060 1065 1070 

Glu Ser He Ala Gly Met Leu Asp Thr Thr Lys Gly Leu He Arg Ala 
1075 1080 1085 

Ser Met Arg Lys Gly Gly Leu Thr Ser Arg Val He Thr Arg Leu Ser 
1090 1095 1100 

Asn Tyr Asp Tyr Glu Gin Phe Arg Ala Gly Met Val Leu Leu Thr Gly 
1105 1110 1115 1120 

Arg Lys Arg Asn Val Leu He Asp Lys Glu Ser Cys Ser Val Gin Leu 
1125 1130 1135 

Ala Arg Ala Leu Arg Ser His Met Trp Ala Arg Leu Ala Arg Gly Arg 
1140 1145 1150 

Pro He Tyr Gly Leu Glu Val Pro Asp Val Leu Glu Ser Met Arg Gly 
1155 1160 1165 

His Leu He Arg Arg His Glu Thr Cys Val He Cys Glu Cys Gly Ser 
1170 1175 1180 

Val Asn Tyr Gly Trp Phe Phe Val Pro Ser Gly Cys Gin Leu Asp Asp 
1185 1190 1195 1200 
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He Asp Lys Glu Thr Ser Ser Leu Arg Val Pro Tyr He Oly Ser Thr 
1205 1210 1215 

Thr Asp Glu Arg Thr Asp Met Lys Leu Ala Phe Val Arg Ala Pro Ser 
1220 1225 1230 

Arg Ser Leu Arg Ser Ala Val Arg He Ala Thr Val Tyr Ser Trp Ala 
1235 1240 1245 

Tyr Gly Asp Asp Asp Ser Ser Trp Asn Glu Ala Trp Leu Leu Ala Arg 
1250 1255 1260 

Gin Arg Ala Asn Val Ser Leu Glu Glu Leu Arg Val He Thr Pro He 
1265 1270 1275 1280 

Ser Thr Ser Thr Asn Leu Ala His Arg Leu Arg Asp Arg Ser Thr Gin 
1285 1290 1295 

Val Lys Tyr Ser Gly Thr Ser Leu Val Arg Val Ala Arg Tyr Thr Thr 
1300 1305 1310 

He Ser Asn Asp Asn Leu Ser Phe Val He Ser Asp Lys Lys Val Asp 
1315 1320 1325 

Thr Asn Phe He Tyr Gin Gin Gly Met Leu Leu Gly Leu Gly Val Leu 
1330 1335 1340 

Glu Thr Leu Phe Arg Leu Glu Lys Asp Thr Gly Ser Ser Asn Thr Val 
1345 1350 1355 1360 

Leu His Leu His Val Glu Thr Asp Cys Cys Val He Pro Met He Asp 
1365 1370 1375 

His Pro Arg He Pro Ser Ser Arg Lys Leu Glu Leu Arg Ala Glu Leu 
1380 1385 1390 

Cys Thr Asn Pro Leu He Tyr Asp Asn Ala Pro Leu He Asp Arg Asp 
1395 1400 1405 

Ala Thr Arg Leu Tyr Thr Gin Ser His Arg Arg His Leu Val Glu Phe 
1410 1415 1420 

Val Thr Trp Ser Thr Pro Gin Leu Tyr His He Leu Ala Lys Ser Thr 
1425 1430 1435 1440 

Ala Leu Ser Met He Asp Leu Val Thr Lys Phe Glu Lys Asp His Met 
1445 1450 1455 

Asn Glu He Ser Ala Leu He Gly Asp Asp Asp He Asn Ser Phe He 
1460 1465 1470 

Thr Glu Phe Leu Leu He Glu Pro Arg Leu Phe Thr He Tyr Leu Gly 
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1475 1480 1485 

Gin Cys Ala Ala lie Asn Trp Ala Phe Asp Val His Tyr His Arg Pro 
1490 1495 1500 

Ser Gly Lys Tyr Gin Met Gly Glu Leu Leu Ser Ser Phe Leu Ser Arg 
1505 1510 1515 1520 

Met Ser Lys Gly Val Phe Lys Val Leu Val Asn Ala Leu Ser His Pro 
1525 1530 1535 

Lys lie Tyr Lys Lys Phe Trp His Cys Gly lie He Glu Pro He His 
1540 1545 1550 

Gly Pro Ser Leu Asp Ala Gin Asn Leu His Thr Thr Val Cys Asn Met 
1555 1560 1565 



Val Tyr Thr Cys Tyr Met Thr Tyr 
1570 1575 

Leu Glu Glu Phe Thr Phe Leu Leu 
1585 1590 



Leu Asp Leu Leu Leu Asn Glu Glu 
1580 

Cys Glu Ser Asp Glu Asp Val Val 
1595 1600 



Pro Asp Arg Phe Asp Asn He Gin Ala Lys His Leu Cys Val Leu Ala 
1605 1610 1615 

Asp Leu Tyr Cys Gin Pro Gly Thr Cys Pro Pro He Arg Gly Leu Arg 
1620 1625 1630 

Pro Val Glu Lys Cys Ala Val Leu Thr Asp His He Lys Ala Glu Ala 
1635 1640 1645 

Arg Leu Ser Pro Ala Gly Ser Ser Trp Asn He Asn Pro He He Val 
1650 1655 1660 

Asp His Tyr Ser Cys Ser Leu Thr Tyr Leu Arg Arg Gly Ser He Lys 
1665 1670 1675 1680 

Gin He Arg Leu Arg Val Asp Pro Gly Phe He Phe Asp Ala Leu Ala 
1685 1690 1695 

Glu Val Asn Val Ser Gin Pro Lys Val Gly Ser Asn Asn He Ser Asn 
1700 1705 1710 

Met Ser He Lys Asp Phe Arg Pro Pro His Asp Asp Val Ala Lys Leu 
1715 1720 1725 

Leu Lys Asp He Asn Thr Ser Lys His Asn Leu Pro He Ser Gly Gly 
1730 1735 1740 

Ser Leu Ala Asn Tyr Glu He His Ala Phe Arg Arg He Gly Leu Asn 
1745 1750 1755 1760 
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Ser Ser Ala Cys Tyr Lys Ala Val Glu lie Ser Thr Leu lie Arg Arg 
1765 1770 1775 

Cys Leu Glu Pro Gly Glu Asp Gly Leu Phe Leu Gly Glu Gly Ser Gly 
1780 1785 1790 

Ser Met Leu He Thr Tyr Lys Glu He Leu Lys Leu Asn Lys Cys Phe 
1795 1800 1805 

Tyr Asn Ser Gly Val Ser Ala Asn Ser Arg Ser Gly Gin Arg Glu Leu 
1810 1815 1820 

Ala Pro Tyr Pro Ser Glu Val Gly Leu Val Glu His Arg Met Gly Val 
1825 1830 1835 1840 

Gly Asn He Val Lys Val Leu Phe Asn Gly Arg Pro Glu Val Thr Trp 
1845 1850 1855 

Val Gly Ser He Asp Cys Phe Asn Phe He Val Ser Asn He Pro Thr 
1860 1865 1870 

Ser Ser Val Gly Phe He His Ser Asp He Glu Thr Leu Pro Asn Lys 
1875 1880 1885 



Asp Thr He Glu Lys Leu Glu Glu 
1890 1895 

Leu Leu Leu Gly Lys He Gly Ser 
1905 1910 

Phe Ser Gly Asp Phe Val Gin Gly 
1925 



Leu Ala Ala He Leu Ser Met Ala 
1900 

He Leu Val He Lys Leu Met Pro 
1915 1920 

Phe He Ser Tyr Val Gly Ser His 
1930 1935 



Tyr Arg Glu Val Asn Leu Val Tyr Pro Arg Tyr Ser Asn Phe He Ser 
1940 1945 1950 

Thr Glu Ser Tyr Leu Val Met Thr Asp Leu Lys Ala Asn Arg Leu Met 
1955 1960 1965 

Asn Pro Glu Lys He Lys Gin Gin He He Glu Ser Ser Val Arg Thr 
1970 1975 1980 

Ser Pro Gly Leu He Gly His He Leu Ser He Lys Gin Leu Ser Cys 
1985 1990 1995 2000 

He Gin Ala He Val Gly Gly Ala Val Ser Arg Gly Asp He Asn Pro 
2005 2010 2015 

He Leu Lys Lys Leu Thr Pro He Glu Gin Val Leu He Ser Cys Gly 
2020 2025 2030 
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Leu Ala He Asn Gly Pro Lys Leu Cys Lys Glu Leu He His His Asp 
2035 2040 2045 

Val Ala Ser Gly Gin Asp Gly Leu Leu Asn Ser He Leu He Leu Tyr 
2050 2055 2060 

Arg Glu Leu Ala Arg Phe Lys Asp Asn Gin Arg Ser Gin Gin Gly Met 
2065 2070 2075 2080 

Phe His Ala Tyr Pro Val Leu Val Ser Ser Arg Gin Arg Glu Leu Val 
2085 2090 2095 

Ser Arg He Thr Arg Lys Phe Trp Gly His He Leu Leu Tyr Ser Gly 
2100 2105 2110 

Asn Arg Lys Leu He Asn Arg Phe He Gin Asn Leu Lys Ser Gly Tyr 
2115 2120 2125 

Leu He Leu Asp Leu His Gin Asn He Phe Val Lys Asn Leu Ser Lys 
2130 2135 2140 

Ser Glu Lys Gin He He Met Thr Gly Gly Leu Lys Arg Glu Trp Val 
2145 2150 2155 2160 

Phe Lys Val Thr Val Lys Glu Thr Lys Glu Trp Tyr Lys Leu Val Gly 
2165 2170 2175 



Tyr Ser Ala Leu He Lys Asp 
2180 

(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15894 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 

ACCAAACAAA GTTGGGTAAG GATAGTTCAA TCAATGATCA TCTTCTAGTG CACTTAGGAT 60 

TCAAGATCCT ATTATCAGGG ACAAGAGCAG GATTAGGGAT ATCCGAGATG GCCACACTTT 120 

TAAGGAGCTT AGCATTGTTC AAAAGAAACA AGGACAAACC ACCCATTACA TCAGGATCCG 180 

GTGGAGCCAT CAGAGGAATC AAACACATTA TTATAGTACC AATCCCTGGA GATTCCTCAA 240 
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TTACCACTCG ATCCAGACTT CTGGACCGGT TGGTCAGGTT AATTGGAAAC CCGGATGTGA 
GCGGGCCCAA ACTAACAGGG GCACTAATAG GTATATTATC CTTATTTGTG GAGTCTCCAG 
GTCAATTGAT TCAGAGGATC ACCGATGACC CTGACGTTAG CATAAGGCTG TTAGAGGTTG 
TCCAGAGTGA CCAGTCACAA TCTGGCCTTA CCTTCGCATC AAGAGGTACC AACATGGAGG 
ATGAGGCGGA CCAATACTTT TCACATGATG ATCCAATTAG TAGTGATCAA TCCAGGTTCG 
GATGGTTCGA GAACAAGGAA ATCTCAGATA TTGAAGTGCA AGACCCTGAG GGATTCAACA 
TGATTCTGGG TACCATCCTA GCCCAAATTT GGGTCTTGCT CGCAAAGGCG GTTACGGCCC 
CAGACACGGC AGCTGATTCG GAGCTAAGAA GGTGGATAAA GTACACCCAA CAAAGAAGGG 
TGGTTGGTGA ATTTAGATTG GAGAGAAAAT GGTTGGATGT GGTGAGGAAC AGGATTGCCG 
AGGACCTCTC CTTACGCCGA TTCATGGTCG CTCTAATCCT GGATATCAAG AGAACACCCG 
GAAACAAACC CAGGATTGCT GAAATGATAT GTGACATTGA TACATATATC GTAGAGGCAG 
GATTAGCCAG TTTTATCCTG ACTATTAAGT TTGGGATAGA AACTATGTAT CCTGCTCTTG 
GACTGCATGA ATTTGCTGGT GAGTTATCCA CACTTGAGTC CTTGATGAAC CTTTACCAGC 
AAATGGGGGA AACTGCACCC TACATGGTAA TCCTGGAGAA CTCAATTCAG AACAAGTTCA 
GTGCAGGATC ATACCCTCTG CTCTGGAGCT ATGCCATGGG AGTAGGAGTG GAACTTGAAA 
ACTCCATGGG AGGTTTGAAC TTTGGCCGAT CTTACTTTGA TCCAGCATAT TTTAGATTAG 
GGCAAGAGAT GGTAAGGAGG TCAGCTGGAA AGGTCAGTTC CACATTGGCA TCTGAACTCG 
GTATCACTGC CGAGGATGCA AGGCTTGTTT CAGAGATTGC AATGCATACT ACTGAGGACA 
AGATCAGTAG AGCGGTTGGA CCCAGACAAG CCCAAGTATC ATTTCTACAC GGTGATCAAA 
GTGAGAATGA GCTACCGAGA TTGGGGGGCA AGGAAGATAG GAGGGTCAAA CAGAGTCGAG 
GAGAAGCCAG GGAGAGCTAC AGAGAAACCG GGCCCAGCAG AGCAAGTGAT GCGAGAGCTG 
CCCATCTTCC AACCGGCACA CCCCTAGACA TTGACACTGC AACGGAGTCC AGCCAAGATC 
CGCAGGACAG TCGAAGGTCA GCTGACGCCC TGCTTAGGCT GCAAGCCATG GCAGGAATCT 
CGGAAGAACA AGGC TCAGAC ACGGACACCC CTATAGTGTA CAATGACAGA AATCTTCTAG 
ACTAGGTGCG AGAGGCCGAG GGCCAGAACA ACATCCGCCT ACCCTCCATC ATTGTTATAA 
AAAACTTAGG AACCAGGTCC ACACAGCCGC CAGCCCATCA ACCATCCACT CCCACGATTG 
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GAGCCAATGG CAGAAGAGCA GGCACGCCAT GTCAAAAACG GACTGGAATG CATCCGGGCT 
CTCAAGGCCG AGCCCATCGG CTCACTGGCC ATCGAGGAAG CTATGGCAGC ATGGTCAGAA 
ATATCAGACA ACCCAGGACA GGAGCGAGCC ACCTGCAGGG AAGAGAAGGC AGGCAGTTCG 
GGTCTCAGCA AACCATGCCT CTCAGCAATT GGATCAACTG AAGGCGGTGC ACCTCGCATC 
CGCGGTCAGG GACCTGGAGA GAGCGATGAC GACGCTGAAA CTTTGGGAAT CCCCCCAAGA 
AATCTCCAGG CATCAAGCAC TGGGTTACAG TGTTATTACG TTTATGATCA CAGCGGTGAA 
GCGGTTAAGG GAATCCAAGA TGCTGACTCT ATCATGGTTC AATCAGGCCT TGATGGTGAT 
AGCACCCTCT CAGGAGGAGA CAATGAATCT GAAAACAGCG ATGTGGATAT TGGCGAACCT 
GATACCGAGG GATATGCTAT CACTGACCGG GGATCTGCTC CCATCTCTAT GGGGTTCAGG 
GCTTCTGATG TTGAAACTGC AGAAGGAGGG GAGATCCACG AGCTCCTGAG ACTCCAATCC 
AGAGGCAACA ACTTTCCGAA GCTTGGGAAA ACTCTCAATG TTCCTCCGCC CCCGGACCCC 
GGTAGGGCCA GCACTTCCGG GACACCCATT AAAAAGGGCA CAGACGCGAG ATTAGCCTCA 
TTTGGAACGG AGATCGCGTC TTTATTGACA GGTGGTGCAA CCCAATGTGC TCGAAAGTCA 
CCCTCGGAAC CATCAGGGCC AGGTGCACCT GCGGGGAATG TCCCCGAGTG TGTGAGCAAT 
GCCGCACTGA TACAGGAGTG GACACCCGAA TCTGGTACCA CAATCTCCCC GAGATCCCAG 
AATAATGAAG AAGGGGGAGA CTATTATGAT GATGAGCTGT TCTCTGATGT CCAAGATATT 
AAAACAGCCT TGGCCAAAAT ACACGAGGAT AATCAGAAGA TAATCTCCAA GCTAGAATCA 
CTGCTGTTAT TGAAGGGAGA AGTTGAGTCA ATTAAGAAGC AGATCAACAG GCAAAATATC 
AGCATATCCA CCCTGGAAGG ACACCTCTCA AGCATCATGA TCGCCATTCC TGGACTTGGG 
AAGGATCCCA ACGACCCCAC TGCAGATGTC GAAATCAATC CCGACTTGAA ACCCATCATA 
GGCAGAGATT CAGGCCGAGC ACTGGCCGAA GTTCTCAAGA AACCCGTTGC CAGCCGACAA 
CTCCAAGGAA TGACAAATGG ACGGACCAGT TCCAGAGGAC AGCTGCTGAA GGAATTTCAG 
CTAAAGCCGA TCGGG AAAAA GATGAGCTCA GCCGTCGGGT TTGTTCCTGA CACCGGCCCT 
GCATCACGCA GTGTAATCCG CTCCATTATA AAATCCAGCC GGCTAGAGGA GGATCGGAAG 
CGTTACCTGA TGACTCTCCT TGATGATATC AAAGGAGCCA ATGATCTTGC CAAGTTCCAC 
CAGATGCTGA TGAAGATAAT AATGAAGTAG CTACAGCTCA ACTTACCTGC CAACCCCATG 
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CCAGTCGACC CAACTAGTAC AACCTAAATC CATTATAAAA AACTTAGGAG CAAAGTGATT 
GCCTCCCAAG TTCCACAATG ACAGAGACCT ACGACTTCGA CAAGTCGGCA TGGGACATCA 
AAGGGTCGAT CGCTCCGATA CAACCCACCA CCTACAGTGA TGGCAGGCTG GTGCCCCAGG 
TCAGAGTCAT AGATCCTGGT CTAGGCGACA GGAAGGATGA ATGCTTTATG TACATGTTTC 
TGCTGGGGGT TGTTGAGGAC AGCGATTCCC TAGGGCCTCC AATCGGGCGA GCATTTGGGT 
CCCTGCCCTT AGGTGTTGGC AGATCCACAG CAAAGCCCGA AAAACTCCTC AAAGAGGCCA 
CTGAGCTTGA CATAGTTGTT AGACGTACAG CAGGGCTCAA TGAAAAACTG GTGTTCTACA 
ACAACACCCC ACTAACTCTC CTCACACCTT GGAGAAAGGT CCTAACAACA GGGAGTGTCT 
TCAACGCAAA CCAAGTGTGC AATGCGGTTA ATCTGATACC GCTCGATACC CCGCAGAGGT 
TCCGTGTTGT TTATATGAGC ATCACCCGTC TTTCGGATAA CGGGTATTAC ACCGTTCCTA 
GAAGAATGCT GGAATTCAGA TCGGTCAATG CAGTGGCCTT CAACCTGCTG GTGACCCTTA 
GGATTGACAA GGCGATAGGC CCTGGGAAGA TCATCGACAA TACAGAGCAA CTTCCTGAGG 
CAACATTTAT GGTCCACATC GGGAACTTCA GGAGAAAGAA GAGTGAAGTC TACTCTGCCG 
ATTATTGCAA AATGAAAATC GAAAAGATGG GCCTGGTTTT TGCACTTGGT GGGATAGGGG 
GCACCAGTCT TCACATTAGA AGCACAGGCA AAATGAGCAA GACTCTCCAT GCACAACTCG 
GGTTCAAGAA GACCTTATGT TACCCGCTGA TGGATATCAA TGAAGACCTT AATCGATTAC 
TCTGGAGGAG CAGATGCAAG ATAGTAAGAA TCCAGGCAGT TTTGCAGCCA TCAGTTCCTC 
AAGAATTCCG CATTTACGAC GACGTGATCA TAAATGATGA CCAAGGACTA TTCAAAGTTC 
TGTAGACCGT AGTGCCCAGC AATGCCCGAA AACGACCCCC CTCACAATGA CAGCCAGAAG 
GCCCGGACAA AAAAGCCCCC TCCGAAAGAC TCCACGGACC AAGCGAGAGG CCAGCCAGCA 
GCCGACGGCA AGCGCGAACA CCAGGCGGCC CCAGCACAGA ACAGCCCCGA CACAAGGCCA 
CCACCAGCCA CCCCAATCTG CATCCTCCTC GTGGGACCCC CGAGGACCAA CCCCCAAGGC 
TGCCCCCGAT CCAAACCACC AACCGCATCC CCACCACCCC CGGGAAAGAA ACCCCCAGCA 
ATTGGAAGGC CCCTCCCCCT CTTCCTCAAC ACAAGAACTC CACAACCGAA CCGCACAAGC 
GACCGAGGTG ACCCAACCGC AGGCATCCGA CTCCCTAGAC AGATCCTCTC TCCCCGGCAA 
ACTAAACAAA ACTTAGGGCC AAGGAACATA CACACCCAAC AGAACCCAGA CCCCGGCCCA 
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CGGCGCCGCG CCCCCAACCC CCGACAACCA GAGGGAGCCC CCAACCAATC CCGCCGGCTC 
CCCCGGTGCC CACAGGCAGG GACACCAACC CCCGAACAGA CCCAGCACCC AACCATCGAC 
AATCCAAGAC GGGGGGGCCC CCCCAAAAAA AGGCCCCCAG GGGCCGACAG CCAGCACCGC 
GAGGAAGCCC ACCCACCCCA CACACGACCA CGGCAACCAA ACCAGAACCC AGACCACCCT 
GGGCCACCAG CTCCCAGACT CGGCCATCAC CCCGCAGAAA GGAAAGGCCA CAACCCGCGC 
ACCCCAGCCC CGATCCGGCG GGGAGCCACC CAACCCGAAC CAGCACCCAA GAG CGATCCC 
CGAAGGACCC CCGAACCGCA AAGGACACCA GTATCCCACA GCCTCTCCAA GTCCCCCGGT 
CTCCTCCTCT TCTCGAAGGG ACCAAAAGAT CAATCCACCA CACCCGACGA CACTCAACTC 
CCCACCCCTA AAGGAGACAC CGGGAATCCC AGAATCAAGA CTCATCCAAT GTCCATCATG 
GGTCTCAAGG TGAACGTCTC TGCCATATTC ATGGCAGTAC TGTTAACTCT CCAAACACCC 
ACCGGTCAAA TCCATTGGGG CAATCTCTCT AAGATAGGGG TGGTAGGAAT AGGAAGTGCA 
AGCTACAAAG TTATGACTCG TTCCAGCCAT CAATCATTAG TCATAAAATT AATGCCCAAT 
ATAACTCTCC TCAATAACTG CACGAGGGTA GAGATTGCAG AATACAGGAG ACTACTGAGA 
ACAGTTTTGG AACCAATTAG AGATGCACTT AATGCAATGA CCCAGAATAT AAGACCGGTT 
CAGAGTGTAG CTTCAAGTAG GAGACACAAG AGATTTGCGG GAGTAGTCCT GGCAGGTGCG 
GCCCTAGGCG TTGCCACAGC TGCTCAGATA ACAGCCGGCA TTGCACTTCA CCAGTCCATG 
CTGAACTCTC AAGCCATCGA CAATCTGAGA GCGAGCCTGG AAACTACTAA TCAGGCAATT 
GAGGCAATCA GACAAGCAGG GCAGGAGATG ATATTGGCTG TTCAGGGTGT CCAAGACTAC 
ATCAATAATG AG CTGAT ACC GTCTATGAAC CAACTATCTT GTGATTTAAT CGGCCAGAAG 
CTCGGGCTCA AATTGCTCAG ATACTATACA GAAATCCTGT CATTATTTGG CCCCAGTTTA 
CGGGACCCCA TATCTGCGGA GATATCTATC CAGGCTTTGA GCTATGCGCT TGGAGGAGAC 
ATCAATAAGG TGTTAGAAAA GCTCGGATAC AGTGGAGGTG ATTTACTGGG CATCTTAGAG 
AGCGGAGGAA TAAAGGCCCG GATAACTCAC GTCGACACAG AGTCCTACTT CATTGTCCTC 
AGTATAGCCT ATCCGACGCT GTCCGAGATT AAGGGGGTGA TTGTCCACCG GCTAGAGGGG 
GTCTCGTACA ACATAGGCTC TCAAGAGTGG TATACCACTG TGCCCAAGTA TGTTGCAACC 
CAAGGGTACC TTAT CTCGAA TTTTGATGAG TCATCGTGTA CTTTCATGCC AGAGGGGACT 
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GTGTGCAGCC AAAATGCCTT GTACCCGATG AGTCCTCTGC TCCAAGAATG CCTCCGGGGG 
TACACCAAGT CCTGTGCTCG TACACTCGTA TCCGGGTCTT TTGGGAACCG GTTCATTTTA 
TCACAAGGGA ACCTAATAGC CAATTGTGCA TCAATCCTTT GCAAGTGTTA CACAACAGGA 
ACGATCATTA ATCAAGACCC TGACAAGATC CTAACATACA TTGCTGCCGA TCACTGCCCG 
GTAGTCGAGG TGAACGGCGT GATCATCCAA GTCGGGAGCA GGAGGTATCC AGACGCTGTG 
TACTTGCACA GAATTGACCT CGGTCCTCCC ATATCATTGG AGAGGTTGGA CGTAGGGACA 
AATCTGGGGA ATGCAATTGC TAAGTTGGAG GATGCCAAGG AATTGTTGGA GTCATCGGAC 
CAGATATTGA GGAGTATGAA AGGTTTATCG AGCACTAGCA TAGTCTACAT CCTGATTGCA 
GTGTGTCTTG GAGGGTTGAT AGGGATCCCC GCTTTAATAT GTTGCTGCAG GGGGCGTTGT 
AACAAAAAGG GAGAACAAGT TGGTATGTCA AGACCAGGCC TAAAGCCTGA TCTTACGGGA 
ACATCAAAAT CCTATGTAAG GTCGCTCTGA TCCTCTACAA CTCTTGAAAC ACAAATGTCC 
CACAAGTCTC CTCTTCGTCA TCAAGCAACC ACCGCACCCA GCATCAAGCC CACCTGAAAT 
TATCTCCGGC TTCCCTCTGG CCGAACAATA TCGGTAGTTA ATTAAAACTT AGGGTGCAAG 
ATCATCCACA ATGTCACCAC AACGAGACCG GATAAATGCC TTCTACAAAG ATAACCCCCA 
TCCCAAGGGA AGTAGGATAG TCATTAACAG AGAACATCTT ATGATTGATA GACCTTATGT 
TTTGCTGGCT GTTCTGTTTG TCATGTTTCT GAGCTTGATC GGGTTGCTAG CCATTGCAGG 
CATTAGACTT CATCGGGCAG CCATCTACAC CGCAGAGATC CATAAAAGCC TCAGCACCAA 
TCTAGATGTA ACTAACTCAA TCGAGCATCA GGTCAAGGAC GTGCTGACAC CACTCTTCAA 
AATCATCGGT GATGAAGTGG GCCTGAGGAC ACCTCAGAGA TTCACTGACC TAGTGAAATT 
CATCTCTGAC AAGATTAAAT TCCTTAATCC GGATAGGGAG TACGACTTCA GAGATCTCAC 
TTGGTGTATC AACCCGCCAG AGAGAATCAA ATTGGATTAT GATCAATACT GTGCAGATGT 
GGCTGCTGAA GAGCTCATGA ATGCATTGGT GAACTCAACT CTACTGGAGA CCAGAACAAC 
CAATCAGTTC CTAGCTGTCT CAAAGGGAAA CTGCTCAGGG CCCACTACAA TCAGAGGTCA 
ATTCTCAAAC ATGTCGCTGT CCCTGTTAGA CTTGTATTTA GGTCGAGGTT ACAATGTGTC 
ATCTATAGTC ACTATGACAT CCCAGGGAAT GTATGGGGGA ACTTACCTAG TGGAAAAGCC 
TAATCTGAGC AGCAAAAGGT CAGAGTTGTC ACAACTGAGC ATGTACCGAG TGTTTGAAGT 
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AGGTGTTATC AGAAATCCGG GTTTGGGGGC TCCGGTGTTC CATATGACAA ACTATCTTGA 
GCAACCAGTC AGTAATGATC TCAGCAACTG TATGGTGGCT TTGGGGGAGC TCAAACTCGC 
AGCCCTTTGT CACGGGGAAG ATTCTATCAC AATTCCCTAT CAGGGATCAG GGAAAGGTGT 
CAGCTTCCAG CTCGTCAAGC TAGGTGTCTG GAAATCCCCA ACCGACATGC AATCCTGGGT 
CCCCTTATCA ACGGATGATC CAGTGATAGA CAGGCTTTAC CTCTCATCTC ACAGAGGTGT 
TATCGCTGAC AATCAAGCAA AATGGGCTGT CCCGACAACA CGAACAGATG ACAAGTTGCG 
AATGGAGACA TGCTTCCAAC AGGCGTGTAA GGGTAAAATC CAAGCACTCT GCGAGAATCC 
CGAGTGGGCA CCATTGAAGG ATAACAGGAT TCCTTCATAC GGGGTCTTGT CTGTTGATCT 
GAGTCTGACA GTTGAGCTTA AAATCAAAAT TGCTTCGGGA TTCGGGCCAT TGATCACACA 
CGGTTCAGGG ATGGACCTAT ACAAATCCAA CCACAACAAT GTGTATTGGC TGACTATCCC 
GCCAATGAAG AACCTAGCCT TAGGTGTAAT CAACACATTG GAGTGGATAC CGAGATTCAA 
GGTTAGTCCC TACCTCTTCA CTGTCCCAAT TAAGGAAGCA GGCGAAGACT GCCATGCCCC 
AACATACCTA CCTGCGGAGG TGGATGGTGA TGTCAAACTC AGTTCCAATC TGGTGATTCT 
ACCTGGTCAA GATCTCCAAT ATGTTTTGGC AACCTACGAT ACTTCCAGGG TTGAACATGC 
TGTGGTTTAT TACGTTTACA GCCCAAGCCG CTCATTTTCT TACTTTTATC CTTTTAGGTT 
GCCTATAAAG GGGGTCCCCA TCGAATTACA AGTGGAATGC TTCACATGGG ACCAAAAACT 
CTGGTGCCGT CACTTCTGTG TGCTTGCGGA CTCAGAATCT GGTGGACATA TCACTCACTC 
TGGGATGGTG GGCATGGGAG TCAGCTGCAC AGTCACCCGG GAAGATGGAA CCAATCGCAG 
ATAGGGCTGC TAGTGAACCA ATCACATGAT GTCACCCAGA CATCAGGCAT ACCCACTAGT 
GTGAAATAGA CATCAGAATT AAGAAAAACG TAGGGTCCAA GTGGTTCCCC GTTATGGACT 
CGCTATCTGT CAACCAGATC TTATACCCTG AAGTTCACCT AGATAGCCCG ATAGTTACCA 
ATAAGATAGT AGCCATCCTG GAGTATGCTC GAGTCCCTC A CGCTTACAGC CTGGAGGACC 
CTACACTGTG TCAGAACATC AAGCACCGCC TAAAAAACGG ATTTTCCAAC CAAATGATTA 
TAAACAATGT GGAAGTTGGG AATGTCATCA AGTCCAAGCT TAGGAGTTAT CCGGCCCACT 
CTCATATTCC ATATCCAAAT TGTAATCAGG ATTTATTTAA CATAGAAGAC AAAGAGTCAA 
CGAGGAAGAT CCGTGAACTC CTCAAAAAGG GGAATTCGCT GTACT CCAAA GTCAGTGATA 
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AGGTTTTCCA ATGCTTAAGG GACACTAACT CACGGCTTGG CCTAGGCTCC GAATTGAGGG 
AGGACATCAA GGAGAAAGTT ATTAACTTGG GAGTTTACAT GCACAGCTCC CAGTGGTTTG 
AGCCCTTTCT GTTTTGGTTT ACAGTCAAGA CTGAGATGAG GTCAGTGATT AAATCACAAA 
CCCATACTTG CCATAGGAGG AGACACACAC CTGTATTCTT CACTGGTAGT TCAGTTGAGT 
TGCTAATCTC TCGTGACCTT GTTGCTATAA TCAGTAAAGA GTCTCAACAT GTATATTACC 
TGACATTTGA ACTGGTTTTG ATGTATTGTG ATGTCATAGA GGGGAGGTTA ATGACAGAGA 
CCGCTATGAC TATTGATGCT AGGTATACAG AGCTTCTAGG AAGAGTCAGA TACATGTGGA 
AACTGATAGA TGGTTTCTTC CCTGCACTCG GG AATCCAAC TTATCAAATT GTAGCCATGC 
TGGAGCCTCT TTCACTTGCT TACCTGCAGC TGAGGGATAT AACAGTAGAA CTCAGAGGTG 
CTTTCCTTAA CCACTGCTTT ACTGAAATAC ATGATGTTCT TGACCAAAAC GGGTTTTCTG 
ATGAAGGTAC TTATCATGAG TTAATTGAAG CTCTAGATTA CATTTTCATA ACTGATGACA 
TACATCTGAC AGGGGAGATT TTCTCATTTT TCAGAAGTTT CGGCCACCCC AGACTTGAAG 
CAGTAACGGC TGCTGAAAAT GTTAGGAAAT ACATGAATCA GCCTAAAGTC ATTGTGTATG 
AGACTCTGAT GAAAGGTCAT GCCATATTTT GTGGAATCAT AATCAACGGC TATCGTGACA 
GGCACGGAGG CAGTTGGCCA CCGCTGACCC TCCCCCTGCA TGCTGCAGAC ACAATCCGGA 
ATGCTCAAGC TTCAGGTGAA GGGTTAACAC ATGAGCAGTG CGTTGATAAC TGGAAATCTT 
TTGCTGGAGT GAAATTTGGC TGCTTTATGC CTCTTAGCCT GGATAGTGAT CTGACAATGT 
ACCTAAAGGA CAAGGCACTT GCTGCTCTCC AAAGGGAATG GGATTCAGTT TACCCGAAAG 
AGTTCCTGCG TTACGACCCT CCCAAGGGAA CCGGGTCACG GAGGCTTGTA GATGTTTTCC 
TTAATGATTC GAGCTTTGAC CCATATGATG TGATAATGTA TGTTGTAAGT GGAGCTTACC 
TCCATGACCC TGAGTTCAAC CTGTCTTACA GCCTGAAAGA AAAGGAGATC AAGGAAACAG 
GTAGACTTTT TGCTAAAATG ACTTACAAAA TGAGGGCATG CCAAGTGATT GCTGAAAATC 
TAATCTCAAA CGGGATTGGC AAATATTTTA AGGACAATGG GATGGCCAAG GATGAGCACG 
ATTTGACTAA GGCACTCCAC ACTCTAGCTG TCTCAGGAGT CCCCAAAGAT CTCAAAGAAA 
GTCACAGGGG GGGGCCAGTC TTAAAAACCT ACTCCCGAAG CCCAGTCCAC ACAAGTACCA 
GGAACGTGAG AGCAGCAAAA GGGTTTATAG GGTTCCCTCA AGTAATTCGG CAGGACCAAG 
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ACACTGATCA TCCGGAGAAT ATGGAAGCTT ACGAGACAGT CAGTGCATTT ATCACGACTG 
ATCTCAAGAA GTACTGCCTT AATTGGAGAT ATGAGACCAT CAGCTTGTTT GCACAGAGGC 
TAAATGAGAT TTACGGATTG CCCTCATTTT TCCAGTGGCT GCATAAGAGG CTTGAGACCT 
CTGTCCTGTA TGTAAGTGAC CCTCATTGCC CCCCCGACCT TGACGCCCAT ATCCCGTTAT 
ATAAAGTCCC CAATGATCAA ATCTTCATTA AGTACCCTAT GGGAGGTATA GAAGGGTATT 
GTCAGAAGCT GTGGACCATC AGCACCATTC CCTATCTATA CCTGGCTGCT TATGAGAGCG 
GAGTAAGGAT TGCTTCGTTA GTGCAAGGGG ACAATCAGAC CATAGCCGTA ACAAAAAGGG 
TACCCAGCAC ATGGCCCTAC AACCTTAAGA AACGGGAAGC TGCTAGAGTA ACTAGAGATT 
ACTTTGTAAT TCTTAGGCAA AGGCTACATG ATATTGGCCA TCACCTCAAG GCAAATGAGA 
CAATTGTTTC ATCACATTTT TTTGTCTATT CAAAAGGAAT ATATTATGAT GGGCTACTTG 
TGTCCCAATC ACTCAAGAGC ATCGCAAGAT GTGTATTCTG GTCAGAGACT ATAGTTGATG 
AAACAAGGGC AGCATGCAGT AATATTGCTA CAACAATGGC TAAAAGCATC GAGAGAGGTT 
ATGACCGTTA CCTTGCATAT TCCCTGAACG TCCTAAAAGT GATACAGCAA ATTCTGATCT 
CTCTTGGCTT CACAATCAAT TCAACCATGA CCCGGGATGT AGTCATACCC CTCCTCACAA 
ACAACGACCT CTTAATAAGG ATGGCACTGT TGCCCGCTCC TATTGGGGGG ATGAATTATC 
TGAATATGAG CAGGCTGTTT GTCAGAAACA TCGGTGATCC AGTAACATCA TCAATTGCTG 
ATCTCAAGAG AATGATTCTC GCCTCACTAA TGCCTGAAGA GACCCTCCAT CAAGTAATGA 
CACAACAACC GGGGGACTCT TCATTCCTAG ACTGGGCTAG CGACCCTTAC TCAGCAAATC 
TTGTATGTGT CCAGAGCATC ACTAGACTCC TCAAGAACAT AACTGCAAGG TTTGTCCTGA 
TCCATAGTCC AAACCCAATG TTAAAAGGAT TATTCCATGA TGACAGTAAA GAAGAGGACG 
AGGGACTGGC GGCATTCCTC ATGGACAGGC ATATTATAGT ACCTAGGGCA GCTCAXGAAA 
TCCTGGATCA TAGTGTCACA GGGG CAAGAG AGTCTATTGC AGGCATGCTG GATACCACAA 
AAGGCTTGAT TCGAGCCAGC ATGAGGAAGG GGGGGTTAAC CTCTCGAGTG ATAACCAGAT 
TGTCCAATTA TGACTATGAA CAATTCAGAG CAGGGATGGT GCTATTGACA GGAAGAAAGC 
GAAATGTCCT CATTGACAAA GAGTCATGTT CAGTGCAGCT GGCGAGAGCT CTAAGAAGCC 
ATATGTGGGC GAGGCTAGCT CGAGGACGGC CTATTTACGG CCTTGAGGTC CCTGATGTAC 
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TAGAATCTAT 6C6A66CCAC CTTATTCGGC GTCATGAGAC ATGTGTCATC TGCGAGTGTG 
GATCAGTCAA CTACGGATGG TTTTTTGTCC CCTCGGGTTG CCAACTGGAT GATATTGACA 
AGGAAACATC ATC CTTGAGA GTCCCATATA TTGGTTCTAC CACTGATGAG AGAACAGACA 
TGAAGCTTGC CTTCGTAAGA GCCCCAAGTC GATCCTTGCG ATCTGCTGTT AGAATAGCAA 
CAGTGTACTC ATGGGCTTAC GGTGATGATG ATAGCTCTTG GAACGAAGCC TGGTTGTTGG 
CTAGGCAAAG GGCCAATGTG AGCCTGGAGG AGCTAAGGGT GATCACTCCC ATCTCAACTT 
CGACTAATTT AGCGCATAGG TTGAGGGATC GTAGCACTCA AGTGAAATAC TCAGGTACAT 
CCCTTGTCCG AGTGGCGAGG TATACCACAA TCTCCAACGA CAATCTCTCA TTTGTCATAT 
CAGATAAGAA GGTTGATACT AACTTTATAT ACCAACAAGG AATGCTTCTA GGGTTGGGTG 
TTTTAGAAAC ATTGTTTCGA CTCGAGAAAG ATACCGGATC ATCTAACACG GTATTACATC 
TTCACGTCGA AACAGATTGT TGCGTGATCC CGATGATAGA TCATCCCAGG ATACCCAGCT 
CCCGCAAGCT AGAGCTGAGG GCAGAGCTAT GTACCAACCC ATTGATATAT GATAATGCAC 
CTTTAATTGA CAGAGATGCA ACAAGGCTAT ACACCCAGAG CCATAGGAGG CACCTTGTGG 
AATTTGTTAC ATGGTCCACA CCCCAACTAT ATCACATTTT AGCTAAGTCC ACAGCACTAT 
CTATGATTGA CCTGGTAACA AAATTTGAGA AGGACCATAT GAATGAAATT TCAGCTCTCA 
TAGGGGATGA CGATATCAAT AGTTTCATAA CTGAGTTTCT GCTCATAGAG CCAAGATTAT 
TCACTATCTA CTTGGGCCAG TGTGCGGCCA TCAATTGGGC ATTTGATGTA CATTATCATA 
GACCATCAGG GAAATATCAG ATGGGTGAGC TGTTGTCATC GTTCCTTTCT AGAATGAGCA 
AAGGAGTGTT TAAGGTGCTT GTCAATGCTC TAAGCCACCC AAAGATCTAC AAGAAATTCT 
GGCATTGTGG TATTATAGAG CCTATCCATG GTCCTTCACT TGATGCTCAA AACTTGCACA 
CAACTGTGTG CAACATGGTT TACACATGCT ATATGACCTA CCTCGACCTG TTGTTGAATG 
AAGAGTTAGA AGAGTTCACA TTTCTCTTGT GTGAAAGCGA CGAGGATGTA GTACCGGACA 
GATTCGACAA CATCCAGGCA AAACACTTAT GTGTTCTGGC AGATTTGTAC TGTCAACCAG 
GGACCTGCCC ACCAATTCGA GGTCTAAGAC CGGTAGAGAA ATGTGCAGTT CTAACCGACC 
ATATCAAGGC AGAGGCTATG TTATCTCCAG CAGGATCTTC GTGGAACATA AATCCAATTA 
TTGTAGACCA TTACTCATGC TCTCTGACTT ATCTCCGGCG AGGATCGATC AAACAGATAA 
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6ATTGA6AGT TGATCCAGGA TTCATTTTCG ACGCCCTCGC TGAGGTAAAT GTCAGTCAGC 
CAAAGATCGG CAGCAACAAC ATCTCAAATA TGAGCATCAA GGCTTTCAGA CCCCCACACG 
ATGATGTTGC AAAATTGCTC AAAGATATCA ACACAAGCAA GCACAATCTT CCCATTTCAG 
GGGGCAATCT CGCCAATTAT GAAATCCATG CTTTCCGCAG AATCGGGTTG AACTCATCTG 
CTTGCTACAA AGCTGTTGAG ATATCAACAT TAATTAGGAG ATGCCTTGAG CCAGGGGAGG 
ACGGCTTGTT CTTGGGTGAG GGATCGGGTT CTATGTTGAT CACTTATAAG GAGATACTTA 
AACTAAACAA GTGCTTCTAT AATAGTGGGG TTTCCGCCAA TTCTAGATCT GGTCAAAGGG 
AATTAGCACC CTATCCCTCC GAAGTTGGCC TTGTCGAACA CAGAATGGGA GTAGGTAATA 
TTGTCAAAGT GCTCTTTAAC GGGAGGCCCG AAGTCACGTG GGTAGGCAGT GTAGATTGCT 
TCAATTTCAT AGTTAGTAAT ATCCCTACCT CTAGTGTGGG GTTTATCCAT TCAGATATAG 
AGACCTTGCC TGACAAAGAT ACTATAGAGA AGCTAGAGGA ATTGGCAGCC ATCTTATCGA 
TGGCTCTGCT CCTGGGCAAA ATAGGATCAA TACTGGTGAT TAAGCTTATG CCTTTCAGCG 
GGGATTTTGT TCAGGGATTT ATAAGTTATG TAGGGTCTCA TTATAGAGAA GTGAACCTTG 
TATACCCTAG ATACAGCAAC TTCATCTCTA CTGAATCTTA TTTGGTTATG ACAGATCTCA 
AGGCTAACCG GCTAATGAAT CCTGAAAAGA TTAAGCAGCA GATAATTGAA TCATCTGTGA 
GGACTTCACC TGGACTTATA GGTCACATCC TATCCATTAA GCAACTAAGC TGCATACAAG 
CAATTGTGGG AGACGCAGTT AGTAGAGGTG ATATCAATCC TACTCTGAAA AAACTTACAC 
CTATAGAGCA GGTGCTGATC AATTGCGGGT TGGCAATTAA CGGACCTAAG CTGTGCAAAG 
AATTGATCCA CCATGATGTT GCCTCAGGGC AAGATGGATT GCTTAATTCT ATACTCATCC 
TCTACAGGGA GTTGGCAAGA TTCAAAGACA ACCAAAGAAG TCAACAAGGG ATGTTCCACG 
CTTACCCCGT ATTGGTAAGT AGCAGGCAAC GAGAACTTAT ATCTAGGATC ACCCGCAAAT 
TCTGGGGGCA CATTCTTCTT TACTCCGGGA ACAGAAAGTT GATAAATAAG TTTATCCAGA 
ATCTCAAGTC CGGCTATCTG ATACTAGACT TACACCAGAA TATCTTCGTT AAGAATCTAT 
CCAAGTCAGA GAAACAGATT ATTATGACGG GGGGTTTGAA ACGTGAGTGG GTTTTTAAGG 
TAACAGTCAA GGAGACCAAA GAATGGTATA AGTTAGTCGG ATACAGTGCC CTGATTAAGG 
ACTAATTGGT TGAACTCCGG AACCCTAATC CTGCCCTAGG TGGTTAGGCA TTATTTGCAA 
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TATATTAAAG AAAACTTTGA AAATACGAAG TTTCTATTCC CAGCTTTGTC TGGT 15894 
(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2183 amino acids 

(B) TYPE: amino acid 

( C ) STRANDEDNES S : 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 

Met Asp Ser Leu Ser Val Asn Gin lie Leu Tyr Pro Glu Val His Leu 
15 10 15 

Asp Ser Pro lie Val Thr Asn Lys He Val Ala He Leu Glu Tyr Ala 
20 25 30 

Arg Val Pro His Ala Tyr Ser Leu Glu Asp Pro Thr Leu Cys Gin Asn 
35 40 45 

He Lys His Arg Leu Lys Asn Gly Phe Ser Asn Gin Met He He Asn 
50 55 60 

Asn Val Glu Val Gly Asn Val He Lys Ser Lys Leu Arg Ser Tyr Pro 
65 70 75 80 

Ala His Ser His He Pro Tyr Pro Asn Cys Asn Gin Asp Leu Phe Asn 
85 90 95 

He Glu Asp Lys Glu Ser Thr Arg Lys He Arg Glu Leu Leu Lys Lys 
100 105 110 

Gly Asn Ser Leu Tyr Ser Lys Val Ser Asp Lys Val Phe Gin Cys Leu 
115 120 125 

Arg Asp Thr Asn Ser Arg Leu Gly Leu Gly Ser Glu Leu Arg Glu Asp 
130 135 140 

He Lys Glu Lys Val He Asn Leu Gly Val Tyr Met His Ser Ser Gin 
145 150 155 160 

Trp Phe Glu Pro Phe Leu Phe Trp Phe Thr Val Lys Thr Glu Met Arg 
165 170 175 

Ser Val He Lys Ser Gin Thr His Thr Cys His Arg Arg Arg His Thr 
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180 185 190 

Pro Val Phe Phe Thr Gly Ser Ser Val Glu Leu Leu lie Ser Arg Asp 
195 200 205 

Leu Val Ala lie lie Ser Lys Glu Ser Gin His Val Tyr Tyr Leu Thr 
210 215 220 

Phe Glu Leu Val Leu Met Tyr Cys Asp Val lie Glu Gly Arg Leu Met 
225 230 235 240 

Thr Glu Thr Ala Met Thr lie Asp Ala Arg Tyr Thr Glu Leu Leu Gly 
245 250 255 

Arg Val Arg Tyr Met Trp Lys Leu He Asp Gly Phe Phe Pro Ala Leu 
260 265 270 

Gly Asn Pro Thr Tyr Gin He Val Ala Met Leu Glu Pro Leu Ser Leu 
275 280 285 

Ala Tyr Leu Gin Leu Arg Asp He Thr Val Glu Leu Arg Gly Ala Phe 
290 295 300 

Leu Asn His Cys Phe Thr Glu He His Asp Val Leu Asp Gin Asn Gly 
305 310 315 320 

Phe Ser Asp Glu Gly Thr Tyr His Glu Leu He Glu Ala Leu Asp Tyr 
325 330 335 

He Phe He Thr Asp Asp He His Leu Thr Gly Glu He Phe Ser Phe 
340 345 350 

Phe Arg Ser Phe Gly His Pro Arg Leu Glu Ala Val Thr Ala Ala Glu 
355 360 365 

Asn Val Arg Lys Tyr Met Asn Gin Pro Lys Val He Val Tyr Glu Thr 
370 375 380 

Leu Met Lys Gly His Ala He Phe Cys Gly He He He Asn Gly Tyr 
385 390 395 400 

Arg Asp Arg His Gly Gly Ser Trp Pro Pro Leu Thr Leu Pro Leu His 
405 410 415 

Ala Ala Asp Thr He Arg Asn Ala Gin Ala Ser Gly Glu Gly Leu Thr 
420 425 430 

His Glu Gin Cys Val Asp Asn Trp Lys Ser Phe Ala Gly Val Lys Phe 
435 440 445 

Gly Cys Phe Met Pro Leu Ser Leu Asp Ser Asp Leu Thr Met Tyr Leu 
450 455 460 
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Lys Asp Lys Ala Leu Ala Ala Leu Gin Arg Glu Trp Asp Ser Val Tyr 
465 470 475 480 

Pro Lys Glu Phe Leu Arg Tyr Asp Pro Pro Lys Gly Thr Gly Ser Arg 
485 490 495 

Arg Leu Val Asp Val Phe Leu Asn Asp Ser Ser Phe Asp Pro Tyr Asp 
500 505 510 

Val lie Met Tyr Val Val Ser Gly Ala Tyr Leu His Asp Pro Glu Phe 
515 520 525 

Asn Leu Ser Tyr Ser Leu Lys Glu Lys Glu lie Lys Glu Thr Gly Arg 
530 535 540 

Leu Phe Ala Lys Met Thr Tyr Lys Met Arg Ala Cya Gin Val lie Ala 
545 550 555 560 

Glu Asn Leu lie Ser Asn Gly lie Gly Lys Tyr Phe Lys Asp Asn Gly 
565 570 575 

Met Ala Lys Asp Glu His Asp Leu Thr Lys Ala Leu His Thr Leu Ala 
580 585 590 

Val Ser Gly Val Pro Lys Asp Leu Lys Glu Ser His Arg Gly Gly Pro 
595 600 605 

Val Leu Lys Thr Tyr Ser Arg Ser Pro Val His Thr Ser Thr Arg Asn 
610 615 620 

Val Arg Ala Ala Lys Gly Phe lie Gly Phe Pro Gin Val lie Arg Gin 
625 630 635 640 

Asp Gin Asp Thr Asp His Pro Glu Asn Met Glu Ala Tyr Glu Thr Val 
645 650 655 

Ser Ala Phe He Thr Thr Asp Leu Lys Lys Tyr Cys Leu Asn Trp Arg 
660 665 670 

Tyr Glu Thr He Ser Leu Phe Ala Gin Arg Leu Asn Glu He Tyr Gly 
675 680 685 

Leu Pro Ser Phe Phe Gin Trp Leu His Lys Arg Leu Glu Thr Ser Val 
690 695 700 

Leu Tyr Val Ser Asp Pro His Cys Pro Pro Asp Leu Asp Ala His He 
705 710 715 720 

Pro Leu Tyr Lys Val Pro Asn Asp Gin He Phe He Lys Tyr Pro Met 
725 730 735 
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Gly Gly lie Glu Gly Tyr Cys Gin Lys Leu Trp Thr He Ser Thr He 
740 745 750 

Pro Tyr Leu Tyr Leu Ala Ala Tyr Glu Ser Gly Val Arg He Ala Ser 
755 760 765 

Leu Val Gin Gly Asp Asn Gin Thr He Ala Val Thr Lys Arg Val Pro 
770 775 780 

Ser Thr Trp Pro Tyr Asn Leu Lys Lys Arg Glu Ala Ala Arg Val Thr 
785 790 795 800 

Arg Asp Tyr Phe Val He Leu Arg Gin Arg Leu His Asp He Gly His 
805 810 815 

His Leu Lys Ala Asn Glu Thr He Val Ser Ser His Phe Phe Val Tyr 
820 825 830 

Ser Lys Gly He Tyr Tyr Asp Gly Leu Leu Val Ser Gin Ser Leu Lys 
835 840 845 

Ser He Ala Arg Cys Val Phe Trp Ser Glu Thr He Val Asp Glu Thr 
850 855 860 

Arg Ala Ala Cys Ser Asn He Ala Thr Thr Met Ala Lys Ser He Glu 
865 870 875 880 

Arg Gly Tyr Asp Arg Tyr Leu Ala Tyr Ser Leu Asn Val Leu Lys Val 
885 890 895 

He Gin Gin He Leu He Ser Leu Gly Phe Thr He Asn Ser Thr Met 
900 905 910 

Thr Arg Asp Val Val He Pro Leu Leu Thr Asn Asn Asp Leu Leu He 
915 920 925 

Arg Met Ala Leu Leu Pro Ala Pro He Gly Gly Met Asn Tyr Leu Asn 
930 935 940 

Met Ser Arg Leu Phe Val Arg Asn He Gly Asp Pro Val Thr Ser Ser 
945 950 955 960 

He Ala Asp Leu Lys Arg Met He Leu Ala Ser Leu Met Pro Glu Glu 
965 970 975 

Thr Leu His Gin Val Met Thr Gin Gin Pro Gly Asp Ser Ser Phe Leu 
980 985 990 

Asp Trp Ala Ser Asp Pro Tyr Ser Ala Asn Leu Val Cys Val Gin Ser 
995 1000 1005 

He Thr Arg Leu Leu Lys Asn He Thr Ala Arg Phe Val Leu He His 
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1010 1015 1020 

Ser Pro Asn Pro Met Leu Lys Gly Leu Phe His Asp Asp Ser Lys Glu 
1025 1030 1035 1040 

Glu Asp Glu Gly Leu Ala Ala Phe Leu Met Asp Arg His lie He Val 
1045 1050 1055 

Pro Arg Ala Ala His Glu He Leu Asp His Ser Val Thr Gly Ala Arg 
1060 1065 1070 

Glu Ser He Ala Gly Met Leu Asp Thr Thr Lys Gly Leu He Arg Ala 
1075 1080 1085 

Ser Met Arg Lys Gly Gly Leu Thr Ser Arg Val He Thr Arg Leu Ser 
1090 1095 1100 

Asn Tyr Asp Tyr Glu Gin Phe Arg Ala Gly Met Val Leu Leu Thr Gly 
1105 1110 1115 1120 

Arg Lys Arg Asn Val Leu He Asp Lys Glu Ser Cys Ser Val Gin Leu 
1125 1130 1135 

Ala Arg Ala Leu Arg Ser His Met Trp Ala Arg Leu Ala Arg Gly Arg 
1140 1145 1150 

Pro He Tyr Gly Leu Glu Val Pro Asp Val Leu Glu Ser Met Arg Gly 
1155 1160 1165 

His Leu He Arg Arg His Glu Thr Cys Val He Cys Glu Cys Gly Ser 
1170 1175 1180 

Val Asn Tyr Gly Trp Phe Phe Val Pro Ser Gly Cys Gin Leu Asp Asp 
1185 1190 1195 1200 

He Asp Lys Glu Thr Ser Ser Leu Arg Val Pro Tyr He Gly Ser Thr 
1205 1210 1215 

Thr Asp Glu Arg Thr Asp Met Lys Leu Ala Phe Val Arg Ala Pro Ser 
1220 1225 1230 

Arg Ser Leu Arg Ser Ala Val Arg He Ala Thr Val Tyr Ser Trp Ala 
1235 1240 1245 

Tyr Gly Asp Asp Asp Ser Ser Trp Asn Glu Ala Trp Leu Leu Ala Arg 
1250 1255 1260 

Gin Arg Ala Asn Val Ser Leu Glu Glu Leu Arg Val He Thr Pro He 
1265 1270 1275 1280 

Ser Thr Ser Thr Asn Leu Ala His Arg Leu Arg Asp Arg Ser Thr Gin 
1285 1290 1295 
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Val Lys Tyr Ser Gly Thr Ser Leu Val Arg Val Ala Arg Tyr Thr Thr 
1300 1305 1310 

lie Ser Asn Asp Asn Leu Ser Phe Val lie Ser Asp Lys Lys Val Asp 
1315 1320 1325 

Thr Asn Phe lie Tyr Gin Gin Gly Met Leu Leu Gly Leu Gly Val Leu 
1330 1335 1340 

Glu Thr Leu Phe Arg Leu Glu Lys Asp Thr Gly Ser Ser Asn Thr Val 
1345 1350 1355 1360 

Leu His Leu His Val Glu Thr Asp Cys Cys Val lie Pro Met lie Asp 
1365 1370 1375 

His Pro Arg lie Pro Ser Ser Arg Lys Leu Glu Leu Arg Ala Glu Leu 
1380 1385 1390 

Cys Thr Asn Pro Leu He Tyr Asp Asn Ala Pro Leu He Asp Arg Asp 
1395 1400 1405 

Ala Thr Arg Leu Tyr Thr Gin Ser His Arg Arg His Leu Val Glu Phe 
1410 1415 1420 

Val Thr Trp Ser Thr Pro Gin Leu Tyr His He Leu Ala Lys Ser Thr 
1425 1430 1435 1440 

Ala Leu Ser Met He Asp Leu Val Thr Lys Phe Glu Lys Asp His Met 
1445 1450 1455 

Asn Glu He Ser Ala Leu He Gly Asp Asp Asp He Asn Ser Phe He 
1460 1465 1470 

Thr Glu Phe Leu Leu He Glu Pro Arg Leu Phe Thr He Tyr Leu Gly 
1475 1480 1485 

Gin Cys Ala Ala He Asn Trp Ala Phe Asp Val His Tyr His Arg Pro 
1490 1495 1500 

Ser Gly Lys Tyr Gin Met Gly Glu Leu Leu Ser Ser Phe Leu Ser Arg 
1505 1510 1515 1520 

Met Ser Lys Gly Val Phe Lys Val Leu Val Asn Ala Leu Ser His Pro 
1525 1530 1535 

Lys He Tyr Lys Lys Phe Trp His Cys Gly He He Glu Pro He His 
1540 1545 1550 

Gly Pro Ser Leu Asp Ala Gin Asn Leu His Thr Thr Val Cys Asn Met 
1555 1560 1565 
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Val Tyr Thr Cys Tyr Met Thr Tyr Leu Asp Leu Leu Leu Asn Glu Glu 
1570 1575 1580 

Leu Glu Glu Phe Thr Phe Leu Leu Cys Glu Ser Asp Glu Asp Val Val 
1585 1590 1595 1600 

Pro Asp Arg Phe Asp Asn lie Gin Ala Lys His Leu Cys Val Leu Ala 
1605 1610 1615 

Asp Leu Tyr Cys Gin Pro Gly Thr Cys Pro Pro He Arg Gly Leu Arg 
1620 1625 1630 

Pro Val Glu Lys Cys Ala Val Leu Thr Asp His He Lys Ala Glu Ala 
1635 1640 1645 

Met Leu Ser Pro Ala Gly Ser Ser Trp Asn He Asn Pro He He Val 
1650 1655 1660 

Asp His Tyr Ser Cys Ser Leu Thr Tyr Leu Arg Arg Gly Ser He Lys 
1665 1670 1675 1680 

Gin He Arg Leu Arg Val Asp Pro Gly Phe He Phe Asp Ala Leu Ala 
1685 1690 1695 

Glu Val Asn Val Ser Gin Pro Lys He Gly Ser Asn Asn He Ser Asn 
1700 1705 1710 

Met Ser He Lys Ala Phe Arg Pro Pro His Asp Asp Val Ala Lys Leu 
1715 1720 1725 

Leu Lys Asp He Asn Thr Ser Lys His Asn Leu Pro He Ser Gly Gly 
1730 1735 1740 

Asn Leu Ala Asn Tyr Glu He His Ala Phe Arg Arg He Gly Leu Asn 
1745 1750 1755 1760 

Ser Ser Ala Cys Tyr Lys Ala Val Glu He Ser Thr Leu He Arg Arg 
1765 1770 1775 

Cys Leu Glu Pro Gly Glu Asp Gly Leu Phe Leu Gly Glu Gly Ser Gly 
1780 1785 1790 

Ser Met Leu He Thr Tyr Lys Glu He Leu Lys Leu Asn Lys Cys Phe 
1795 1800 1805 

Tyr Asn Ser Gly Val Ser Ala Asn Ser Arg Ser Gly Gin Arg Glu Leu 
1810 1815 1820 

Ala Pro Tyr Pro Ser Glu Val Gly Leu Val Glu His Arg Met Gly Val 
1825 1830 1835 1840 

Gly Asn He Val Lys Val Leu Phe Asn Gly Arg Pro Glu Val Thr Trp 
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1845 1850 1855 

Val Gly Ser Val Asp Cys Phe Asn Phe lie Val Ser Asn He Pro Thr 
1860 1865 1870 

Ser Ser Val Gly Phe He His Ser Asp He Glu Thr Leu Pro Asp Lys 
1875 1880 1885 

Asp Thr He Glu Lys Leu Glu Glu Leu Ala Ala He Leu Ser Met Ala 
1890 1895 1900 

Leu Leu Leu Gly Lys He Gly Ser He Leu Val He Lys Leu Met Pro 
1905 1910 1915 1920 

Phe Ser Gly Asp Phe Val Gin Gly Phe He Ser Tyr Val Gly Ser His 
1925 1930 1935 

Tyr Arg Glu Val Asn Leu Val Tyr Pro Arg Tyr Ser Asn Phe He Ser 
1940 1945 1950 

Thr Glu Ser Tyr Leu Val Met Thr Asp Leu Lys Ala Asn Arg Leu Met 
1955 1960 1965 

Asn Pro Glu Lys He Lys Gin Gin He He Glu Ser Ser Val Arg Thr 
1970 1975 1980 

Ser Pro Gly Leu He Gly His He Leu Ser He Lys Gin Leu Ser Cys 
1985 1990 1995 2000 

He Gin Ala He Val Gly Asp Ala Val Ser Arg Gly Asp He Asn Pro 
2005 2010 2015 

Thr Leu Lys Lys Leu Thr Pro He Glu Gin Val Leu He Asn Cys Gly 
2020 2025 2030 

Leu Ala He Asn Gly Pro Lys Leu Cys Lys Glu Leu He His His Asp 
2035 2040 2045 

Val Ala Ser Gly Gin Asp Gly Leu Leu Asn Ser He Leu He Leu Tyr 
2050 2055 2060 

Arg Glu Leu Ala Arg Phe Lys Asp Asn Gin Arg Ser Gin Gin Gly Met 
2065 2070 2075 2080 

Phe His Ala Tyr Pro Val Leu Val Ser Ser Arg Gin Arg Glu Leu He 
2085 2090 2095 

Ser Arg He Thr Arg Lys Phe Trp Gly His He Leu Leu Tyr Ser Gly 
2100 2105 2110 

Asn Arg Lys Leu He Asn Lys Phe He Gin Asn Leu Lys Ser Gly Tyr 
2115 2120 2125 
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Leu lie Leu Asp Leu His Gin Asn lie Phe Val Lys Asn Leu Ser Lys 
2130 2135 2140 



Ser Glu Lys Gin lie lie Met Thr Gly Gly Leu Lys Arg Glu Trp Val 
2145 2150 2155 2160 



Phe Lys Val Thr Val Lys Glu Thr Lys Glu Trp Tyr Lys Leu Val Gly 
2165 2170 2175 



Tyr Ser Ala Leu lie Lys Asp 
2180 



(2) INFORMATION FOR SEQ ID NO: 11; 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15894 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 

ACCAAACAAA GTTGGGTAAG GATAGTTCAA TCAATGATCA TCTTCTAGTG CACTTAGGAT 60 

TCAAGATCCT ATTATCAGGG ACAAGAGCAG GATTAGGGAT ATCCGAGATG GCCACACTTT 120 

TAAGGAGCTT AGCATTGTTC AAAAGAAACA AGGACAAACC ACCCATTACA TCAGGATCCG 180 

GTGGAGCCAT CAGAGGAATC AAACACATTA TTATAGTACC AATCCCTGGA GATTCCTCAA 240 

TTACCACTCG ATCCAGACTT CTGGACCGGT TGGTGAGGTT AATTGGAAAC CCGGATGTGA 300 

GCGGGCCCAA ACTAACAGGG GCACTAATAG GTATATTATC CTTATTTGTG GAGTCTCCAG 360 

GTCAATTGAT TCAGAGGATC ACCGATGACC CTGACGTTAG CATAAGGCTG TTAGAGGTTG 420 

TCCAGAGTGA CCAGTCACAA TCTGGCCTTA CCTTCGCATC AAGAGGTACC AACATGGAGG 480 

ATGAGGCGGA CCAATACTTT TCACATGATG ATCCAATTAG TAGTGATCAA TCCAGGTTCG 540 

GATGGTTCGG GAACAAGGAA ATCTCAGATA TTGAAGTGCA AGACCCTGAG GGATTCAACA 600 

TGATTCTGGG TACCATCCTA GCCCAAATTT GGGTCTTGCT CGCAAAGGCG GTTACGGCCC 660 

CAGACACGGC AGCTGATTCG GAGCTAAGAA GGTGGATAAA GTACACCCAA CAAAGAAGGG 720 
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TAGTT6GTGA ATTTAGATTG GAGAGAAAAT GGTTGGATGT GGTGAGGAAC AGGATTGCCG 
AGGACCTCTC CTTACGCCGA TTCATGGTCG CTCTAATCCT GGATATCAAG AGAACACCCG 
GAAACAAACC CAGGATTGCT GAAATGATAT GTGACATTGA TACATATATC GTAGAGGCAG 
GATTAGCCAG TTTTATCCTG ACTATTAAGT TTGGGATAGA AACTATGTAT CCTGCTCTTG 
GACTGCATGA ATTTGCTGGT GAGTTATCCA CACTTGAGTC CTTGATGAAC CTTTACCAGC 
AAATGGGGGA AACTGCACCC TACATGGTAA TCCTGGAGAA CTCAATTCAG AACAAGTTCA 
GTGCAGGATC ATACCCTCTG CTCTGGAGCT ATGCCATGGG AGTAGGAGTG GAACTTGAAA 
ACTCCATGGG AGGTTTGAAC TTTGGCCGAT CTTACTTTGA TCCAGCATAT TTTAGATTAG 
GGCAAGAGAT GGTAAGGAGG TCAGCTGGAA AGGTCAGTTC CACATTGGCA TCTGAACTCG 
GTATCACTGC CGAGGATGCA AGGCTTGTTT CAGAGATTGC AATGCATACT ACTGAGGACA 
AGATCAGTAG AGCGGTTGGA CCCAGACAAG CCCAAGTATC ATTTCTACAC GGTGATCAAA 
GTGAGAATGA GCTACCGAGA TTGGGGGGCA AGGAAGATAG GAGGGTCAAA CAGAGTCGAG 
GAGAAGCCAG GGAGAGCTAC AGAGAAACCG GGCCCAGCAG AGCAAGTGAT GCGAGAGCTG 
CCCATCTTCC AACCGGCACA CCCCTAGACA TTGACACTGC AACGGAGTCC AGCCAAGATC 
CGCAGGACAG TCGAAGGTCA GCTGACGCCC TGCTTAGGCT GCAAGCCATG GCAGGAATCT 
CGGAAGAACA AGGCTCAGAC ACGGACACCC CTATAGTGTA CAATGACAGA AATCTTCTAG 
ACTAGGTGCG AGAGGCCGAG GGCCAGAACA ACATCCGCCT ACCATCCATC ATTGTTATAA 
AAAACTTAGG AACCAGGTCC ACACAGCCGC CAGCCCATCA ACCATCCACT CCCACGATTG 
GAGCCAATGG CAGAAGAGCA GGCACGCCAT GTCAAAAACG GACTGGAATG CATCCGGGCT 
CTCAAGGCCG AGCCCATCGG CTCACTGGCC ATCGAGGAAG CTATGGCAGC ATGGTCAGAA 
ATATCAGACA ACCCAGGACA GGAGCGAGCC ACCTGCAGGG AAGAGAAGGC AGGCAGTTCG 
GGTCTCAGCA AACCATGCCT CTCAGCAATT GGATCAACTG AAGGCGGTGC ACCTCGCATC 
CGCGGTCAGG GACCTGGAGA GAGCGATGAC GACGCTGAAA CTTTGGGAAT CCCCCCAAGA 
AATCTCCAGG CATCAAGCAC TGGGTTACAG TGTTATTACG TTTATGATCA CAGCGGTGAA 
GCGGTTAAGG GAATCCAAGA TGCTGACTCT ATCATGGTTC AATCAGGCCT TGATGGTGAT 
AGCACCCTCT CAGGAGGAGA CAATGAATCT GAAAACAGCG ATGTGGATAT TGGCGAACCT 



780 
840 
900 
960 
1020 
1080 
1140 
1200 
1260 
1320 
1380 
1440 
1500 
1560 
1620 
1680 
1740 
1800 
1860 
1920 
1980 
2040 
2100 
2160 
2220 
2280 
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GATACCGAGG GATATGCTAT CACTGACCGG GGATCTGCTC CCATCTCTAT GGGGTTCAGG 
GCTTCTGATG TTGAAACTGC AGAAGGAGGG GAGATCCACG AGCTCCTGAG ACTCCAATCC 
AGAGGCAACA ACTTTCCGAA GCTTGGGAAA ACTCTCAATG TTCCTCCGCC CCCGGACCCC 
GGTAGGGCCA GCACTTCCGG GACACCCATT AAAAAGGGCA CAGACGCGAG ATTAGCCTCA 
TTTGGAACGG AGATCGCGTC TTTATTGACA GGTGGTGCAA CCCAATGTGC TCGAAAGTCA 
CCCTCGGAAC CATCAGGGCC AGGTGCACCT GCGGGGAATG TCCCCGAGTG TGTGAGCAAT 
GCCGCACTGA TACAGGAGTG GACACCCGAA TCTGGTACCA CAATCTCCCC GAGATCCCAG 
AATAATGAAG AAGGGGGAGA CTATTATGAT GATGAGCTGT TCTCTGATGT CCAAGATATT 
AAAACAGCCT TGGCCAAAAT ACACGAGGAT AATCAGAAGA TAATCTCCAA GCTAGAATCA 
CTGCTGTTAT TGAAGGGAGA AGTTGAGTCA ATTAAGAAGC AGATCAACAG GCAAAATATC 
AGCATATCCA CCCTGGAAGG ACACCTCTCA AGCATCATGA TCGCCATTCC TGGACTTGGG 
AAGGATCCCA ACGACCCCAC TGCAGATGTC GAAATCAATC CCGACTTGAA ACCCATCATA 
GGCAGAGATT CAGGCCGAGC ACTGGCCGAA GTTCTCAAGA AACCCGTTGC CAGCCGACAA 
CTCCAAGGAA TGACAAATGG ACGGACCAGT TCCAGAGGAC AGCTGCTGAA GGAATTTCAG 
CTAAAGCCGA TCGGGAAAAA GATGAGCTCA GCCGTCGGGT TTGTTCCTGA CACCGGCCCT 
GCATCACGCA GTGTAATCCG CTCCATTATA AAATCCAGCC GGCTAGAGGA GGATCGGAAG 
CGTTACCTGA TGACTCTCCT TGATGATATC AAAGGAGCCA ATGATCTTGC CAAGTTCCAC 
CAGATGCTGA TGAAGATAAT AATGAAGTAG CTACAGCTCA ACTTACCTGC CAACCCCATG 
CCAGTCGACC CAACTAGTAC AACCTAAATC CATTATAAAA AACTTAGGAG CAAAGTGATT 
GCCTCCCAAG GTCCACAATG ACAGAGACCT ACGACTTCGA CAAGTCGGCA TGGGACATCA 
AAGGGTCGAT CGCTCCGATA CAACCCACCA CCTACAGTGA TGGCAGGCTG GTGCCCCAGG 
TCAGAGTCAT AGATCCTGGT CTAGGCGACA GGAAGGATGA ATGCTTTATG TACATGTTTC 
TGCTGGGGGT TGTTGAGGAC AGCGATTCCC TAGGGCCTCC AATCGGGCGA GCATTTGGGT 
TCCTGCCCTT AGGTGTTGGC AGATCCACAG CAAAGCCCGA AAAACTCCTC AAAGAGGCCA 
CTGAGCTTGA CATAGTTGTT AGACGTACAG CAGGGCTCAA TGAAAAACTG GTGTTCTACA 
ACAACACCCC ACTAACTCTC CTCACACCTT GGAGAAAGGT CCTAACAACA GGGAGTGTCT 



2340 
2400 
2460 
2520 
2580 
2640 
2700 
2760 
2820 
2880 
2940 
3000 
3060 
3120 
3180 
3240 
3300 
3360 
3420 
3480 
3540 
3600 
3660 
3720 
3780 
3840 
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TCAACGCAAA CCAAGTGTGC AATGCGGTTA ATCTGATACC GCTCGATACC CCGCAGAGGT 3900 

TCCGTGTTGT TTATATGAGC ATCACCCGTC TTTCGGATAA CGGGTATTAC ACCGTTCCTA 3960 

GAAGAATGCT GGAATTCAGA TCGGTCAATG CAGTGGCCTT CAACCTGCTG GTGACCCTTA 4020 

GGATTGACAA GGCGATAGGC CCTGGGAAGA TCATCGACAA TACAGAGCAA CTTCCTGAGG 4080 

CAACATTTAT GGTCCACATC GGGAACTTCA GGAGAAAGAA GAGTGAAGTC TACTCTGCCG 4140 

ATTATTGCAA AATGAAAATC GAAAAGATGG GCCTGGTTTT TGCACTTGGT GGGAT AGGGG 4200 

GCACCAGTCT TCACATTAGA AGCACAGGCA AAATGAGCAA GACTCTCCAT GCACAACTCG 4260 

GGTTCAAGAA GACCTTATGT TACCCGCTGA TGGATATCAA TGAAGACCTT AATCGATTAC 4320 

TCTGGAGGAG CAGATGCAAG ATAGTAAGAA TCCAGGCAGT TTTGCAGCCA TCAGTTCCTC 4380 

AAGAATTCCG CATTTACGAC GACGTGATCA TAAATGATGA CCAAGGACTA TTCAAAGTTC 4440 

TGTAGACCGT AGTGCCCAGC AATGCCCGAA AACGACCCCC CTCACAATGA CAGCCAGAAG 4500 

GCCCGGACAA AAAAGCCCCC TCCGAAAGAC TCCACGGACC AAGCGAGAGG CCAGCCAGCA 4560 

GCCGACGGCA AGCGCGAACA CCAGGCGGCC CCAGCACAGA ACAGCCCTGA CACAAGGCCA 4620 

CCACCAGCCA CCCCAATCTG CATCCTCCTC GTGGGACCCC CGAGGACCAA CCCCCAAGGC 4680 

TGCCCCCGAT CCAAACCACC AACCGCATCC CCACCACCCC CGGGAAAGAA ACCCCCAGCA 4740 

ATTGGAAGGC CCCTCCCCCT CTTCCTCAAC ACAAGAACTC CACAACCGAA CCGCACAAGC 4800 

GACCGAGGTG ACCCAACCGC AGGCATCCGA CTCCCTAGAC AGATCCTCTC TCCCCGGCAA 4860 

ACTAAACAAA ACTTAGGGCC AAGGAACATA CACACCCAAC AGAACCCAGA CCCCGGTCCA 4920 

CGGTGCCGCG CCCCCAACCC CCGACAACCA GAGGGAGCCC CCAACCAATC CCGCCGGCTC 4980 

CCCCGGTGCC CACAGGCAGG GACACCAACC CCCGAACAGA CCCAGCACCC AACCATCGAC 5040 

AATCCAAGAC GGGGGGGCCC CCCCAAAAAA AGGCCCCCAG GGGCCGACAG CCAGCACCGC 5100 

GAGGAAGCCC ACCCACCCCA CACACGACCA CGGCAACCAA ACCAGAACCC AGACCACCCT 5160 

GGGCCACCAG CTCCCAGACT CGGCCATCAC CCCGCAGAAA GGAAAGGCCA CAACCCGCGC 5220 

ACCCCAGCCC CGATCCGGCG GGGAGCCACC CAACCCGAAC CAGCACCCAA GAGCGATCCC 5280 

CGAAGGACCC CCGAACCGCA AAGGACATCA GTATCCCACA GCCTCTCCAA GTCCCCCGGT 5340 

CTCCTCCTCT TCTCGAAGGG ACCAAAAGAT CAATCCACCA CACCCGACGA CACTCAACTC 5400 
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CCCACCCCTA AAGGAGACAC CGGGAATCCC AGAATCAAGA CTCATCCAAT GTCCATCATG 
GGTCTCAAGG TGAACGTCTC TGCCATATTC ATGGCAGTAC TGTTAACTCT CCAAACACCC 
ACCGGTCAAA TCCATTGGGG CAATCTCTCT AAGATAGGGG TGGTAGGAAT AGGAAGTGCA 
AGCTACAAAG TTATGACTCG TTCCAGCCAT CAATCATTAG TCATAAAATT AATGCCCAAT 
ATAACTCTCC TCAATAACTG CACGAGGGTA GAGATTGCAG AATACAGGAG ACTACTGAGA 
ACAGTTTTGG AACCAATTAG AGATGCACTT AATGCAATGA CCCAGAATAT AAGACCGGTT 
CAGAGTGTAG CTTCAAGTAG GAGACACAAG AGATTTGCGG GAGTAGTCCT GGCAGGTGCG 
GCCCTAGGCG TTGCCACAGC TGCTCAGATA ACAGCCGGCA TTGCACTTCA CCAGTCCATG 
CTGAACTCTC AAGCCATCGA CAATCTGAGA GCGAGCCTGG AAACTACTAA TCAGGCAATT 
GAGACAATCA GACAAGCAGG GCAGGAGATG ATATTGGCTG TTCAGGGTGT CCAAGACTAC 
ATCAATAATG AGCTGATACC GTCTATGAAC CAACTATCTT GTGATTTAAT CGGCCAGAAG 
CTCGGGCTCA AATTGCTCAG ATACTATACA GAAATCCTGT CATTATTTGG CCCCAGTTTA 
CGGGACCCCA TATCTGCGGA GATATCTATC CAGGCTTTGA GCTATGCGCT TGGAGGAGAC 
ATCAATAAGG TGTTAGAAAA GCTCGGATAC AGTGGAGGTG ATTTACTGGG CATCTTAGAG 
AGCGGAGGAA TAAAGGCCCG GATAACTCAC GTCGACACAG AGTCCTACTT CATTGTCCTC 
AGTATAGCCT ATCCGACGCT GTCCGAGATT AAGGGGGTGA TTGTCCACCG GCTAGAGGGG 
GTCTCGTACA ACATAGGCTC TCAAGAGTGG TATACCACTG TGCCCAAGTA TGTTGCAACC 
CAAGGGTACC TTATCTCGAA TTTTGATGAG TCATCGTGTA CTTTCATGCC AGAGGGGACT 
GTGTGCAGCC AAAATGCCTT GTACCCGATG AGTCCTCTGC TCCAAGAATG CCTCCGGGGG 
TACACCAAGT CCTGTGCTCG TACACTCGTA TCCGGGTCTT TTGGGAACCG GTTCATTTTA 
TCACAAGGGA ACCTAATAGC CAATTGTGCA TCAATCCTTT GCAAGTGTTA CACAACAGGA 
ACGATCATTA ATCAAGACCC TGACAAGATC CTAACATACA TTGCTGCCGA TCACTGCCCG 
GTAGTCGAGG TGAACGGCGT GACCATCCAA GTCGGGAGCA GGAGGTATCC AGACGCTGTG 
TACTTGCACA GAATTGACCT CGGTCCTCCC ATATCATTGG AGAGGTTGGA CGTAGGGACA 
AATCTGGGGA ATGCAATTGC TAAGTTGGAG GATGCCAAGG AATTGTTGGA GTCATCGGAC 
CAGATATTGA GGAGTATGAA AGGTTTATCG AGCACTAGCA TAGTCTACAT CCTGATTGCA 
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GTGT6TCTTG GAGGGTTGAT AGGGATCCCC GCTTTAATAT GTTGCTGCAG GGGGCGTTGT 
AACAAAAAGG GAGAACAAGT TGGTATGTCA AGACCAGGCC TAAAGCCTGA TCTTACGGGA 
ACATCAAAAT CCTATGTAAG GTCGCTCTGA TCCTCTACAA CTCTTGAAAC ACAAATGTCC 
CACAAGTCTC CTCTTCGTCA TCAAGCAACC ACCGCACCCA GCATCAAGCC CACCTGAAAT 
TATCTCCGGC TTCCCTCTGG CCGAACAATA TCGGTAGTTA ATCAAAACTT AGGGTGCAAG 
ATCATCCACA ATGTCACCAC AACGAGACCG GATAAATGCC TTCTACAAAG ATAACCCCCA 
TCCCAAGGGA AGTAGGATAG TCATTAACAG AGAACATCTT ATGATTGATA GACCTTATGT 
TTTGCTGGCT GTTCTGTTTG TCATGTTTCT GAGCTTGATC GGGTTGCTAG CCATTGCAGG 
CATTAGACTT CATCGGGCAG CCATCTACAC CGCAGAGATC CATAAAAGCC TCAGCACCAA 
TCTAGATGTA ACTAACTCAA TCGAGCATCA GGTCAAGGAC GTGCTGACAC CACTCTTCAA 
AATCATCGGT GATGAAGTGG GCCTGAGGAC ACCTCAGAGA TTCACTGACC TAGTGAAATT 
AATCTCTGAC AAGATTAAAT TCCTTAATCC GGATAGGGAG TACGACTTCA GAGATCTCAC 
TTGGTGTATC AACCCGCCAG AGAGAATCAA ATTGGATTAT GATCAATACT GTGCAGATGT 
GGCTGCTGAA GAGCTCATGA ATGCATTGGT GAACTCAACT CTACTGGAGA CCAGAACAAC 
CAATCAGTTC CTAGCTGTCT CAAAGGGAAA CTGCTCAGGG CCCACTACAA TCAGAGGTCA 
ATTCTCAAAC ATGTCGCTGT CCCTGTTAGA CTTGTATTTA GGTCGAGGTT ACAATGTGTC 
ATCTATAGTC ACTATGACAT CCCAGGGAAT GTATGGGGGA ACTTACCTAG TGGAAAAGCC 
TAATCTGAGC AGCAAAAGGT CAGAGTTGTC ACAACTGAGC ATGTACCGAG TGTTTGAAGT 
AGGTGTTATC AGAAATCCGG GTTTGGGGGC TCCGGTGTTC CATATGACAA ACTATCTTGA 
GCAACCAGTC AGTAATGATC TCAGCAACTG TATGGTGGCT TTGGGGGAGC TCAAACTCGC 
AGCCCTTTGT CACGGGGAAG ATTCTATCAC AATTCCCTAT CAGGGATCAG GGAAAGGTGT 
CAGCTTCCAG CTCGTCAAGC TAGGTGTCTG GAAATCCCCA ACCGACATGC AATCCTGGGT 
CCCCTTATCA ACGGATGATC CAGTGATAGA CAGGCTTTAC CTCTCATCTC ACAGAGGTGT 
TATCGCTGAC AATCAAGCAA AATGGGCTGT CCCGACAACA CGAACAGATG ACAAGTTGCG 
AATGGAGACA TGCTTCCAAC AGGCGTGTAA GGGTAAAATC CAAGCACTCT GCGAGAATCC 
CGAGTGGGCA CCATTGAAGG ATAACAGGAT TCCTTCATAC GGGGTCTTGT CTGTTGATCT 
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GAQTCTGACA GTTGAGCTTA AAATCAAAAT TGCTTCGGGA TTCGGGCCAT TGATCACACA 
CGGTTCAGGG ATGGACCTAT ACAAATCCAA CCACAACAAT GTGTATTGGC TGACTATCCC 
GCCAATGAAG AACCTAGCCT TAGGTGTAAT CAACACATTG GAGTGGATAC CGAGATTCAA 
GGTTAGTCCC TACCTCTTCA CTGTCCCAAT TAAGGAAGCA GGCGAAGACT GCCATGCCCC 
AACATACCTA CCTGCGGAGG TGGATGGTGA TGTCAAACTC AGTTCCAATC TGGTGATTCT 
ACCTGGTCAA GATCTCCAAT ATGTTTTGGC AACCTACGAT ACTTCCAGGG TTGAACATGC 
TGTGGTTTAT TACGTTTACA GCCCAAGCCG CTCATTTTCT TACTTTTATC CTTTTAGGTT 
GCCTATAAAG GGGGTCCCCA TCGAATTACA AGTGGAATGC TTCACATGGG ACCAAAAACT 
CTGGTGCCGT CACTTCTGTG TGCTTGCGGA CTCAGAATCT GGTGGACATA TCACTCACTC 
TGGGATGGTG GGCATGGGAG TCAGCTGCAC AGTCACCCGG GAAGATGGAA CCAATCGCAG 
ATAGGGCTGC TAGTGAACCA ATCACATGAT GTCAC CCAGA CATCAGGCAT ACCCACTAGT 
GTGAAATAGA CATCAGAATT AAGAAAAACG TAGGGTCCAA GTGGTTCCCC GTTATGGACT 
CGCTATCTGT CAACCAGATC TTATACCCTG AAGTT CAC CT AGATAGCCCG ATAGTTACCA 
ATAAGATAGT AGCCATCCTG GAGTATGCTC GAGTCCCTCA CGCTTACAGC CTGGAGGACC 
CTACACTGTG TCAGAACATC AAGCACCGCC TAAAAAACGG ATTTTCCAAC CAAATGATTA 
TAAACAATGT GGAAGTTGGG AATGTCATCA AGTCCAAGCT TAGGAGTTAT CCGGCCCACT 
CTCATATTCC ATATCCAAAT TGTAATCAGG ATTTATTTAA CATAGAAGAC AAAGAGTCAA 
CGAGGAAGAT CCGTGAACTC CTCAAAAAGG GGAATTCGCT GTACTCCAAA GTCAGTGATA 
AGGTTTTCCA ATGCTTAAGG GACACTAACT CACGGCTTGG CCTAGGCTCC GAATTGAGGG 
AGGACATCAA GGAGAAAGTT ATTAACTTGG GAGTTTACAT GCACAGCTCC CAGTGGTTTG 
AGCCCTTTCT GTTTTGGTTT ACAGTCAAGA CTGAGATGAG GTCAGTGATT AAATCACAAA 
CCCATACTTG CCATAGGAGG AGACACACAC CTGTATTCTT CACTGGTAGT TCAGTTGAGT 
TGCTAATCTC TCGTGACCTT GTTGCTATAA TCAGTAAAGA GTCTCAACAT GTATATTACC 
TGACATTTGA ACTGGTTTTG ATGTATTGTG ATGTCATAGA GGGGAGGTTA ATGACAGAGA 
CCGCTATGAC TATTGATGCT AGGTATACAG AGCTTCTAGG AAGAGTCAGA TACATGTGGA 
AACTGATAGA TGGTTTCTTC CCTGCACTCG GGAATCCAAC TTATCAAATT GTAGC CATGC 
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TGGAGCCTCT TTCACTTGCT TACCTGCAGC TGAGGGATAT AACAGTAGAA CTCAGAGGTG 
CTTTCCTTAA CCACTGCTTT ACTGAAATAC ATGATGTTCT TGACCAAAAC GGGTTTTCTG 
ATGAAGGTAC TTATCATGAG TTAACTGAAG CTCTAGATTA CATTTTCATA ACTGATGACA 
TACATCTGAC AGGGGAGATT TTCTCATTTT TCAGAAGTTT CGGCCACCCC AGACTTGAAG 
CAGTAACGGC TGCTGAAAAT GTTAGGAAAT ACATGAATCA GCCTAAAGTC ATTGTGTATG 
AGACTCTGAT GAAAGGTCAT GCCATATTTT GTGGAATCAT AATCAACGGC TATCGTGACA 
GGCACGGAGG CAGTTGGCCA CCGCTGACCC TCCCCCTGCA TGCTGCAGAC ACAATCCGGA 
ATGCTCAAGC TTCAGGTGAA GGGTTAACAC ATGAGCAGTG CGTTGATAAC TGGAAATCTT 
TTGCTGGAGT GAAATTTGGC TGCTTTATGC CTCTTAGCCT GGATAGTGAT CTGACAATGT 
ACCTAAAGGA CAAGGCACTT GCTGCTCTCC AAAGGGAATG GGATTCAGTT TACCCGAAAG 
AGTTCCTGCG TTACGACCCT CCCAAGGGAA CCGGGTCACG GAGGCTTGTA GATGTTTTCC 
TTAATGATTC GAGCTTTGAC CCATATGATG TGATAATGTA TGTTGTAAGT GGAGCTTACC 
TCCATGACCC TGAGTTCAAC CTGTCTTACA GCCTGAAAGA AAAGGAGATC AAGGAAACAG 
GTAGACTTTT TGCTAAAATG ACTTACAAAA TGAGGGCATG CCAAGTGATT GCTGAAAATC 
TAATCTCAAA CGGGATTGGC AAATATTTTA AGGACAATGG GATGGCCAAG GATGAGCACG 
ATTTGACTAA GGCACTCCAC ACTCTAGCTG TCTCAGGAGT CCCCAAAGAT CTCAAAGAAA 
GTCACAGGGG GGGGCCAGTC TTAAAAACCT ACTCCCGAAG CCCAGTCCAC ACAAGTACCA 
GGAACGTGAG AGCAGCAAAA GGGTTTATAG GGTTCCCTCA AGTAATTCGG CAGGACCAAG 
ACACTGATCA TCCGGAGAAT ATGGAAGCTT ACGAGACAGT CAGTGCATTT ATCACGACTG 
ATCTCAAGAA GTACTGCCTT AATTGGAGAT ATGAGACCAT CAGCTTGTTT GCACAGAGGC 
TAAATGAGAT TTACGGATTG CCCTCATTTT TCCAGTGGCT GCATAAGAGG CTTGAGACCT 
CTGTCCTGTA TGTAAGTGAC CCTCATTGCC CCCCCGACCT TGACGCCCAT ATCCCGTTAT 
ATAAAGTCCC CAATGATCAA ATCTTCATTA AGTACCCTAT GGGAGGTATA GAAGGGTATT 
GTCAGAAGCT GTGGACCATC AGCACCATTC CCTATCTATA CCTGGCTGCT TATGAGAGCG 
GAGTAAGGAT TGCTTCGTTA GTGCAAGGGG ACAATCAGAC CATAGCCGTA ACAAAAAGGG 
TACCCAGCAC ATGGCCCTAC AACCTTAAGA AACGGGAAGC TGCTAGAGTA ACTAGAGATT 
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ACTTTGTAAT TCTTAGGCAA AGGCTACATG ATATTGGCCA TCACCTCAAG GCAAATGAGA 
CAATTGTTTC ATCACATTTT TTTGTCTATT CAAAAGGAAT ATATTATGAT GGGCTACTTG 
TGTCCCAATC ACTCAAGAGC ATCGCAAGAT GTGTATTCTG GTCAGAGACT ATAGTTGATG 
AAACAAGGGC AGCATGCAGT AATATTGCTA CAACAATGGC TAAAAGCATC GAGAGAGGTT 
ATGACCGTTA CCTTGCATAT TCCCTGAACG TCCTAAAAGT GATACAGCAA ATTCTGATCT 
CTCTTGGCTT CACAATCAAT TCAACCATGA CCCGGGATGT AGTCATACCC CTCCTCACAA 
ACAACGACCT CTTAATAAGG ATGGCACTGT TGCCCGCTCC TATTGGGGGG ATGAATTATC 
TGAATATGAG CAGGCTGTTT GTCAGAAACA TCGGTGATCC AGTAACATCA TCAATTGCTG 
ATCTCAAGAG AATGATTCTC GCCTCACTAA TGCCTGAAGA GACCCTCCAT CAAGTAATGA 
CACAACAACC GGGGGACTCT TCATTCCTAG ACTGGGCTAG CGACCCTTAC TCAGCAAATC 
TTGTATGTGT CCAGAGCATC ACTAGACTCC TCAAGAACAT AACTGCAAGG TTTGTCCTGA 
TCCATAGTCC AAACCCAATG TTAAAAGGAT TATTCCATGA TGACAGTAAA GAAGAGGACG 
AGGGACTGGC GGCATTCCTC ATGGACAGGC ATATTATAGT ACCTAGGGCA GCTCATGAAA 
TCCTGGATCA TAGTGTCACA GGGGCAAGAG AGTCTATTGC AGGCATGCTG GATACCACAA 
AAGGCTTGAT TCGAGCCAGC ATGAGGAAGG GGGGGTTAAC CTCTCGAGTG ATAACCAGAT 
TGTCCAATTA TGACTATGAA CAATTCAGAG CAGGGATGGT GCTATTGACA GGAAGAAAGA 
GAAATGTCCT CATTGACAAA GAGTCATGTT CAGTGCAGCT GGCGAGAGCT CTAAGAAGCC 
ATATGTGGGC GAGGCTAGCT CGAGGACGGC CTATTTACGG CCTTGAGGTC CCTGATGTAC 
TAGAATCTAT GCGAGGCCAC CTTATTCGGC GTCATGAGAC ATGTGTCATC TGCGAGTGTG 
GATCAGTCAA CTACGGATGG TTTTTTGTCC CCTCGGGTTG CCAACTGGAT GATATTGACA 
AGGAAACATC ATCCTTGAGA GTCCCATATA TTGGTTCTAC CACTGATGAG AGAACAGACA 
TGAAGCTTGC CTTCGTAAGA GCCCCAAGTC GATCCTTGCG ATCTGCTGTT AGAATAGCAA 
CAGTGTACTC ATGGGCTTAC GGTGATGATG ATAGCTCTTG GAACGAAGCC TGGTTGTTGG 
CTAGGCAAAG GGCCAATGTG AGCCTGGAGG AGCTAAGGGT GATCACTCCC ATCTCAACTT 
CGACTAATTT AGCGCATAGG TTGAGGGATC GTAGCACTCA AGTGAAATAC TCAGGTACAT 
CCCTTGTCCG AGTGGCGAGG TATACCACAA TCTCCAACGA CAATCTCTCA TTTGTCATAT 
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CAGATAAGAA GGTTGATACT AACTTTATAT ACCAACAAGG AATGCTTCTA GGGTTGGGTG 
TTTTAGAAAC ATTGTTTCGA CTCGAGAAAG ATACCGGATC ATCTAACACG GTATTACATC 
TTCACGTCGA AACAGATTGT TGCGTGATCC CGATGATAGA TCATCCCAGG ATACCCAGCT 
CCCGCAAGCT AGAGCTGAGG GCAGAGCTAT GTACCAACCC ATTGATATAT GATAATGCAC 
CTTTAATTGA CAGAGATGCA ACAAGGC TAT ACACCCAGAG CCATAGGAGG CACCTTGTGG 
AATTTGTTAC ATGGTCCACA CCCCAACTAT ATCACATTTT AGCTAAGTCC ACAGCACTAT 
CTATGATTGA CCTGGTAACA AAATTTGAGA AGGACCATAT GAATGAAATT TCAGCTCTCA 
TAGGGGATGA CGATATCAAT AGTTTCATAA CTGAGTTTCT GCTCATAGAG CCAAGATTAT 
TCACTATCTA CTTGGGCCAG TGTGCGGCCA TCAATTGGGC ATTTGATGTA CATTATCATA 
GACCATCAGG GAAATATCAG ATGGGTGAGC TGTTGTCATC GTTCCTTTCT AGAATGAGCA 
AAGGAGTGTT TAAGGTGCTT GTCAAT GCTC TAAGCCACCC AAAGATCTAC AAGAAATTCT 
GGCATTGTGG TATTATAGAG CCTATCCATG GTCCTTCACT TGATGCTCAA AACTTGCACA 
CAACTGTGTG CAACATGGTT TACACATGCT ATATGACCTA CCTCGACCTG TTGTTGAATG 
AAGAGTTAGA AGAGTTCACA TTTCTCTTGT GTGAAAGCGA CGAGGATGTA GTACCGGACA 
GATTCGACAA CATCCAGGCA AAACACTTAT GTGTTCTGGC AGATTTGTAC TGTCAACCAG 
GGACCTGCCC ACCAATTCGA GGTCTAAGAC CGGTAGAGAA ATGTGCAGTT CTAACCGACC 
ATATCAAGGC AGAGGCTATG TTATCTCCAG CAGGATCTTC GTGGAACATA AATCCAATTA 
TTGTAGACCA TTACTCATGC TCTCTGACTT ATCTCCGGCG AGGATCGATC AAACAGATAA 
GATTGAGAGT TGATCCAGGA TTCATTTTCG ACGCCCTCGC TGAGGTAAAT GTCAGTCAGC 
CAAAGATCGG CAGCAACAAC ATCTCAAATA TGAGCATCAA GGCTTTCAGA CCCCCACACG 
ATGATGTTGC AAAATTGCTC AAAGATATCA ACACAAGCAA GCACAATCTT CCCATTTCAG 
GGGGCAATCT CGCCAATTAT GAAATCCATG CTTTCCGCAG AATCGGGTTG AACTCATCTG 
CTTGCTACAA AGCTGTTGAG ATATCAACAT TAATTAGGAG ATGCCTTGAG CCAGGGGAGG 
ACGGCTTGTT CTTGGGTGAG GGATCGGGTT CTATGTTGAT CACTTATAAA GAGATACTTA 
AACTAAACAA GTGCTTCTAT AATAGTGGGG TTTCCGCCAA TTCTAGATCT GGTCAAAGGG 
AATTAGCACC CTATCCCTCC GAAGTTGGCC TTGTCGAACA CAGAATGGGA GTAGGTAATA 
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TTGTCAAAGT GCTCTTTAAC GGGAGGCCCG AAGTCACGTG GGTAGGCAGT GTAGATTGCT 
TCAATTTCAT AGTTAGTAAT ATCCCTACCT CTAGTGTGGG GTTTATCCAT TCAGATATAG 
AGACCTTGCC TGACAAAGAT ACTATAGAGA AGCTAGAGGA ATTGGCAGCC ATCTTATCGA 
TGGCTCTGCT CCTGGGCAAA ATAGGATCAA TACTGGTGAT TAAGCTTATG CCTTTCAGCG 
GGGATTTTGT TCAGGGATTT ATAAGTTATG TAGGGTCTCA TTATAGAGAA GTGAACCTTG 
TATACCCTAG ATACAGCAAC TTCATCTCTA CTGAATCTTA TTTGGTTATG ACAGATCTCA 
AGGCTAACCG GCTAATGAAT CCTGAAAAGA TTAAGCAGCA GATAATTGAA TCATCTGTGA 
GGACTTCACC TGGACTTATA GGTCACATCC TATCCATTAA GCAACTAAGC TGCATACAAG 
CAATTGTGGG AGACGCAGTT AGTAGAGGTG ATATCAATCC TACT CTGAAA AAACTTACAC 
CTATAGAGCA GGTGCTGATC AATTGCGGGT TGGCAATTAA CGGACCTAAG CTGTGCAAAG 
AATTGATCCA CCATGATGTT GCCTCAGGGC AAGATGGATT GCTTAATTCT ATACTCATCC 
TCTACAGGGA GTTGGCAAGA TTCAAAGACA ACCAAAGAAG TCAACAAGGG ATGTTCCACG 
CTTACCCCGT ATTGGTAAGT AGCAGGCAAC GAGAACTTAT ATCTAGGATC ACCCGCAAAT 
TCTGGGGGCA CATTCTTCTT TACTCCGGGA ACAAAAAGTT GATAAATAAG TTTATCCAGA 
ATCTCAAGTC CGGCTATCTG ATACTAGACT TACACCAGAA TATCTTCGTT AAGAATCTAT 
CCAAGTCAGA GAAACAGATT ATTATGACGG GGGGTTTGAA ACGTGAGTGG GTTTTTAAGG 
TAACAGTCAA GGAGACCAAA GAATGGTATA AGTTAGTCGG ATACAGTGCC CTGATTAAGG 
ACTAATTGGT TGAACTCCGG AACCCTAATC CTGCCCTAGG TGGTTAGGCA TTATTTGCAA 
TATATTAAAG AAAACTTTGA AAATACGAAG TTTCTATTCC CAGCTTTGTC TGGT 
(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2183 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 
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Met Asp Ser Leu Ser Val Asn Gin lie Leu Tyr Pro Glu Val His Leu 
15 10 15 

Asp Ser Pro lie Val Thr Asn Lys lie Val Ala He Leu Glu Tyr Ala 
20 25 30 

Arg Val Pro His Ala Tyr Ser Leu Glu Asp Pro Thr Leu Cys Gin Asn 
35 40 45 

He Lys His Arg Leu Lys Asn Gly Phe Ser Asn Gin Met He He Asn 
50 55 60 

Asn Val Glu Val Gly Asn Val He Lys Ser Lys Leu Arg Ser Tyr Pro 
65 70 75 80 

Ala His Ser His He Pro Tyr Pro Asn Cys Asn Gin Asp Leu Phe Asn 
85 90 95 

He Glu Asp Lys Glu Ser Thr Arg Lys He Arg Glu Leu Leu Lys Lys 
100 105 110 

Gly Asn Ser Leu Tyr Ser Lys Val Ser Asp Lys Val Phe Gin Cys Leu 
115 120 125 

Arg Asp Thr Asn Ser Arg Leu Gly Leu Gly Ser Glu Leu Arg Glu Asp 
130 135 140 

He Lys Glu Lys Val He Asn Leu Gly Val Tyr Met His Ser Ser Gin 
145 150 155 160 

Trp Phe Glu Pro Phe Leu Phe Trp Phe Thr Val Lys Thr Glu Met Arg 
165 170 175 

Ser Val He Lys Ser Gin Thr His Thr Cys His Arg Arg Arg His Thr 
180 185 190 

Pro Val Phe Phe Thr Gly Ser Ser Val Glu Leu Leu He Ser Arg Asp 
195 200 205 

Leu Val Ala He He Ser Lys Glu Ser Gin His Val Tyr Tyr Leu Thr 
210 215 220 

Phe Glu Leu Val Leu Met Tyr Cys Asp Val He Glu Gly Arg Leu Met 
225 230 235 240 

Thr Glu Thr Ala Met Thr He Asp Ala Arg Tyr Thr Glu Leu Leu Gly 
245 250 255 

Arg Val Arg Tyr Met Trp Lys Leu He Asp Gly Phe Phe Pro Ala Leu 
260 265 270 
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Gly Asn Pro Thr Tyr Gin lie Val Ala Met Leu Glu Pro Leu Ser Leu 
275 280 285 

Ala Tyr Leu Gin Leu Arg Asp lie Thr Val Glu Leu Arg Gly Ala Phe 
290 295 300 

Leu Asn His Cys Phe Thr Glu lie His Asp Val Leu Asp Gin Asn Gly 
305 310 315 320 

Phe Ser Asp Glu Gly Thr Tyr His Glu Leu Thr Glu Ala Leu Asp Tyr 
325 330 335 

lie Phe lie Thr Asp Asp lie His Leu Thr Gly Glu lie Phe Ser Phe 
340 345 350 

Phe Arg Ser Phe Gly His Pro Arg Leu Glu Ala Val Thr Ala Ala Glu 
355 360 365 

Asn Val Arg Lys Tyr Met Asn Gin Pro Lys Val He Val Tyr Glu Thr 
370 375 380 

Leu Met Lys Gly His Ala He Phe Cys Gly He He He Asn Gly Tyr 
385 390 395 400 

Arg Asp Arg His Gly Gly Ser Trp Pro Pro Leu Thr Leu Pro Leu His 
405 410 415 

Ala Ala Asp Thr He Arg Asn Ala Gin Ala Ser Gly Glu Gly Leu Thr 
420 425 430 

His Glu Gin Cys Val Asp Asn Trp Lys Ser Phe Ala Gly Val Lys Phe 
435 440 445 

Gly Cys Phe Met Pro Leu Ser Leu Asp Ser Asp Leu Thr Met Tyr Leu 
450 455 460 

Lys Asp Lys Ala Leu Ala Ala Leu Gin Arg Glu Trp Asp Ser Val Tyr 
465 470 475 480 

Pro Lys Glu Phe Leu Arg Tyr Asp Pro Pro Lys Gly Thr Gly Ser Arg 
485 490 495 

Arg Leu Val Asp Val Phe Leu Asn Asp Ser Ser Phe Asp Pro Tyr Asp 
500 505 510 

Val He Met Tyr Val Val Ser Gly Ala Tyr Leu His Asp Pro Glu Phe 
515 520 525 

Asn Leu Ser Tyr Ser Leu Lys Glu Lys Glu He Lys Glu Thr Gly Arg 
530 535 540 

Leu Phe Ala Lys Met Thr Tyr Lys Met Arg Ala Cys Gin Val He Ala 
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545 550 555 560 

Glu Asn Leu lie Ser Asn Gly lie Gly Lys Tyr Phe Lys Asp Asn Gly 
565 570 575 

Met Ala Lys Asp Glu His Asp Leu Thr Lys Ala Leu His Thr Leu Ala 
580 585 590 

Val Ser Gly Val Pro Lys Asp Leu Lys Glu Ser His Arg Gly Gly Pro 
595 600 605 

Val Leu Lys Thr Tyr Ser Arg Ser Pro Val His Thr Ser Thr Arg Asn 
610 615 620 

Val Arg Ala Ala Lys Gly Phe He Gly Phe Pro Gin Val He Arg Gin 
625 630 635 640 

Asp Gin Asp Thr Asp His Pro Glu Asn Met Glu Ala Tyr Glu Thr Val 
645 650 655 

Ser Ala Phe He Thr Thr Asp Leu Lys Lys Tyr Cys Leu Asn Trp Arg 
660 665 670 

Tyr Glu Thr He Ser Leu Phe Ala Gin Arg Leu Asn Glu He Tyr Gly 
675 680 665 

Leu Pro Ser Phe Phe Gin Trp Leu His Lys Arg Leu Glu Thr Ser Val 
690 695 700 

Leu Tyr Val Ser Asp Pro His Cys Pro Pro Asp Leu Asp Ala His He 
705 710 715 720 

Pro Leu Tyr Lys Val Pro Asn Asp Gin He Phe He Lys Tyr Pro Met 
725 730 735 

Gly Gly He Glu Gly Tyr Cys Gin Lys Leu Trp Thr He Ser Thr He 
740 745 750 

Pro Tyr Leu Tyr Leu Ala Ala Tyr Glu Ser Gly Val Arg He Ala Ser 
755 760 765 

Leu Val Gin Gly Asp Asn Gin Thr He Ala Val Thr Lys Arg Val Pro 
770 775 780 

Ser Thr Trp Pro Tyr Asn Leu Lys Lys Arg Glu Ala Ala Arg Val Thr 
785 790 795 800 

Arg Asp Tyr Phe Val He Leu Arg Gin Arg Leu His Asp He Gly His 
805 810 815 

His Leu Lys Ala Asn Glu Thr He Val Ser Ser His Phe Phe Val Tyr 
820 825 830 
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Ser Lys Gly lie Tyr Tyr Asp Gly Leu Leu Val Ser Gin Ser Leu Lys 
835 840 845 

Ser He Ala Arg Cys Val Phe Trp Ser Glu Thr He Val Asp Glu Thr 
850 855 860 

Arg Ala Ala Cys Ser Asn He Ala Thr Thr Met Ala Lys Ser He Glu 
865 870 875 880 

Arg Gly Tyr Asp Arg Tyr Leu Ala Tyr Ser Leu Asn Val Leu Lys Val 
885 890 895 

He Gin Gin He Leu He Ser Leu Gly Phe Thr He Asn Ser Thr Met 
900 905 910 

Thr Arg Asp Val Val He Pro Leu Leu Thr Asn Asn Asp Leu Leu He 
915 920 925 

Arg Met Ala Leu Leu Pro Ala Pro He Gly Gly Met Asn Tyr Leu Asn 
930 935 940 

Met Ser Arg Leu Phe Val Arg Asn He Gly Asp Pro Val Thr Ser Ser 
945 950 955 960 

He Ala Asp Leu Lys Arg Met He Leu Ala Ser Leu Met Pro Glu Glu 
965 970 975 

Thr Leu His Gin Val Met Thr Gin Gin Pro Gly Asp Ser Ser Phe Leu 
980 985 990 

Asp Trp Ala Ser Asp Pro Tyr Ser Ala Asn Leu Val Cys Val Gin Ser 
995 1000 1005 

He Thr Arg Leu Leu Lys Asn He Thr Ala Arg Phe Val Leu He His 
1010 1015 1020 

Ser Pro Asn Pro Met Leu Lys Gly Leu Phe His Asp Asp Ser Lys Glu 
1025 1030 1035 1040 

Glu Asp Glu Gly Leu Ala Ala Phe Leu Met Asp Arg His He He Val 
1045 1050 1055 

Pro Arg Ala Ala His Glu He Leu Asp His Ser Val Thr Gly Ala Arg 
1060 1065 1070 

Glu Ser He Ala Gly Met Leu Asp Thr Thr Lys Gly Leu He Arg Ala 
1075 1080 1085 

Ser Met Arg Lys Gly Gly Leu Thr Ser Arg Val He Thr Arg Leu Ser 
1090 1095 1100 
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Asn Tyr Asp Tyr Glu Gin Phe Arg Ala Gly Met Val Leu Leu Thr Gly 
1105 1110 1115 1120 

Arg Lys Arg Asn Val Leu lie Asp Lys Glu Ser Cys Ser Val Gin Leu 
1125 1130 1135 

Ala Arg Ala Leu Arg Ser His Met Trp Ala Arg Leu Ala Arg Gly Arg 
1140 1145 1150 

Pro lie Tyr Gly Leu Glu Val Pro Asp Val Leu Glu Ser Met Arg Gly 
1155 1160 1165 

His Leu lie Arg Arg His Glu Thr Cys Val He Cys Glu Cys Gly Ser 
1170 1175 1180 

Val Asn Tyr Gly Trp PKe Phe Val Pro Ser Gly Cys Gin Leu Asp Asp 
1185 1190 1195 1200 

He Asp Lys Glu Thr Ser Ser Leu Arg Val Pro Tyr He Gly Ser Thr 
1205 1210 1215 

Thr Asp Glu Arg Thr Asp Met Lys Leu Ala Phe Val Arg Ala Pro Ser 
1220 1225 1230 

Arg Ser Leu Arg Ser Ala Val Arg He Ala Thr Val Tyr Ser Trp Ala 
1235 1240 1245 

Tyr Gly Asp Asp Asp Ser Ser Trp Asn Glu Ala Trp Leu Leu Ala Arg 
1250 1255 1260 

Gin Arg Ala Asn Val Ser Leu Glu Glu Leu Arg Val He Thr Pro He 
1265 1270 1275 1280 

Ser Thr Ser Thr Asn Leu Ala His Arg Leu Arg Asp Arg Ser Thr Gin 
1285 1290 1295 

Val Lys Tyr Ser Gly Thr Ser Leu Val Arg Val Ala Arg Tyr Thr Thr 
1300 1305 1310 

He Ser Asn Asp Asn Leu Ser Phe Val He Ser Asp Lys Lys Val Asp 
1315 1320 1325 

Thr Asn Phe He Tyr Gin Gin Gly Met Leu Leu Gly Leu Gly Val Leu 
1330 1335 1340 

Glu Thr Leu Phe Arg Leu Glu Lys Asp Thr Gly Ser Ser Asn Thr Val 
1345 1350 1355 1360 

Leu His Leu His Val Glu Thr Asp Cys Cys Val He Pro Met He Asp 
1365 1370 1375 

His Pro Arg He Pro Ser Ser Arg Lys Leu Glu Leu Arg Ala Glu Leu 
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1380 1385 1390 

Cys Thr Asn Pro Leu lie Tyr Asp Asn Ala Pro Leu lie Asp Arg Asp 
1395 1400 1405 

Ala Thr Arg Leu Tyr Thr Gin Ser His Arg Arg His Leu Val Glu Phe 
1410 1415 1420 

Val Thr Trp Ser Thr Pro Gin Leu Tyr His lie Leu Ala Lys Ser Thr 
1425 1430 1435 1440 

Ala Leu Ser Met lie Asp Leu Val Thr Lys Phe Glu Lys Asp His Met 
1445 1450 1455 

Asn Glu He Ser Ala Leu He Gly Asp Asp Asp He Asn Ser Phe He 
1460 1465 1470 

Thr Glu Phe Leu Leu He Glu Pro Arg Leu Phe Thr He Tyr Leu Gly 
1475 1480 1485 

Gin Cys Ala Ala He Asn Trp Ala Phe Asp Val His Tyr His Arg Pro 
1490 1495 1500 

Ser Gly Lys Tyr Gin Met Gly Glu Leu Leu Ser Ser Phe Leu Ser Arg 
1505 1510 1515 1520 

Met Ser Lys Gly Val Phe Lys Val Leu Val Asn Ala Leu Ser His Pro 
1525 1530 1535 

Lys He Tyr Lys Lys Phe Trp His Cys Gly He He Glu Pro He His 
1540 1545 1550 

Gly Pro Ser Leu Asp Ala Gin Asn Leu His Thr Thr Val Cys Asn Met 
1555 1560 1565 

Val Tyr Thr Cys Tyr Met Thr Tyr Leu Asp Leu Leu Leu Asn Glu Glu 
1570 1575 1580 

Leu Glu Glu Phe Thr Phe Leu Leu Cys Glu Ser Asp Glu Asp Val Val 
1585 1590 1595 1600 

Pro Asp Arg Phe Asp Asn He Gin Ala Lys His Leu Cys Val Leu Ala 
1605 1610 1615 

Asp Leu Tyr Cys Gin Pro Gly Thr Cys Pro Pro He Arg Gly Leu Arg 
1620 1625 1630 

Pro Val Glu Lys Cys Ala Val Leu Thr Asp His He Lys Ala Glu Ala 
1635 1640 1645 

Met Leu Ser Pro Ala Gly Ser Ser Trp Asn He Asn Pro He He Val 
1650 1655 1660 
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Asp His Tyr Ser Cys Ser Leu Thr Tyr Leu Arg Arg Gly Ser lie Lys 
1665 1670 1675 1680 

Gin lie Arg Leu Arg Val Asp Pro Gly Phe lie Phe Asp Ala Leu Ala 
1685 1690 1695 

Glu Val Asn Val Ser Gin Pro Lys lie Gly Ser Asn Asn lie Ser Asn 
1700 1705 1710 

Met Ser lie Lys Ala Phe Arg Pro Pro His Asp Asp Val Ala Lys Leu 
1715 1720 1725 

Leu Lys Asp lie Asn Thr Ser Lys His Asn Leu Pro lie Ser Gly Gly 
1730 1735 1740 

Asn Leu Ala Asn Tyr Glu He His Ala Phe Arg Arg lie Gly Leu Asn 
1745 1750 1755 1760 

Ser Ser Ala Cys Tyr Lys Ala Val Glu lie Ser Thr Leu lie Arg Arg 
1765 1770 1775 

Cys Leu Glu Pro Gly Glu Asp Gly Leu Phe Leu Gly Glu Gly Ser Gly 
1780 1785 1790 

Ser Met Leu lie Thr Tyr Lys Glu He Leu Lys Leu Asn Lys Cys Phe 
1795 1800 1805 

Tyr Asn Ser Gly Val Ser Ala Asn Ser Arg Ser Gly Gin Arg Glu Leu 
1810 1815 1820 

Ala Pro Tyr Pro Ser Glu Val Gly Leu Val Glu His Arg Met Gly Val 
1825 1830 1835 1840 

Gly Asn He Val Lys Val Leu Phe Asn Gly Arg Pro Glu Val Thr Trp 
1845 1850 1855 

Val Gly Ser Val Asp Cys Phe Asn Phe He Val Ser Asn He Pro Thr 
1860 1865 1870 

Ser Ser Val Gly Phe He His Ser Asp He Glu Thr Leu Pro Asp Lys 
1875 1880 1885 

Asp Thr He Glu Lys Leu Glu Glu Leu Ala Ala He Leu Ser Met Ala 
1890 1895 1900 

Leu Leu Leu Gly Lys He Gly Ser He Leu Val He Lys Leu Met Pro 
1905 1910 1915 1920 

Phe Ser Gly Asp Phe Val Gin Gly Phe He Ser Tyr Val Gly Ser His 
1925 1930 1935 
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Tyr Arg Glu Val Asn Leu Val Tyr Pro Arg Tyr Ser Asn Phe lie Ser 
1940 1945 1950 

Thr Glu Ser Tyr Leu Val Met Thr Asp Leu Lys Ala Asn Arg Leu Met 
1955 1960 1965 

Asn Pro Glu Lys lie Lys Gin Gin lie lie Glu Ser Ser Val Arg Thr 
1970 1975 1980 

Ser Pro Gly Leu lie Gly His lie Leu Ser lie Lys Gin Leu Ser Cys 
1985 1990 1995 2000 

lie Gin Ala lie Val Gly Asp Ala Val Ser Arg Gly Asp lie Asn Pro 
2005 2010 2015 

Thr Leu Lys Lys Leu Thr Pro lie Glu Gin Val Leu lie Asn Cys Gly 
2020 2025 2030 

Leu Ala lie Asn Gly Pro Lys Leu Cys Lys Glu Leu lie His His Asp 
2035 2040 2045 

Val Ala Ser Gly Gin Asp Gly Leu Leu Asn Ser lie Leu lie Leu Tyr 
2050 2055 2060 

Arg Glu Leu Ala Arg Phe Lys Asp Asn Gin Arg Ser Gin Gin Gly Met 
2065 2070 2075 2080 

Phe His Ala Tyr Pro Val Leu Val Ser Ser Arg Gin Arg Glu Leu lie 
2085 2090 2095 

Ser Arg lie Thr Arg Lys Phe Trp Gly His lie Leu Leu Tyr Ser Gly 
2100 2105 2110 

Asn Lys Lys Leu lie Asn Lys Phe lie Gin Asn Leu Lys Ser Gly Tyr 
2115 2120 2125 

Leu lie Leu Asp Leu His Gin Asn lie Phe Val Lys Asn Leu Ser Lys 
2130 2135 2140 

Ser Glu Lys Gin He He Met Thr Gly Gly Leu Lys Arg Glu Trp Val 
2145 2150 2155 2160 

Phe Lys Val Thr Val Lys Glu Thr Lys Glu Trp Tyr Lys Leu Val Gly 
2165 2170 2175 

Tyr Ser Ala Leu He Lys Asp 
2180 

(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15894 base pairs 
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(B) TYPE: nucleic acid 

(C) STRANDEDNBSS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 
ACCAAACAAA GTTGGGTAAG GATAGTTCAA TCAATGATCA TTTTCTAGTG CACTTAGGAT 
TCAAGATCCT ATTATCAGGG ACAAGAGCAG GATTAAGGAT ATCCGAGATG GCCACACTTT 
TAAGGAGCTT AGCATTGTTC AAAAGAAACA AGGACAAACC ACCCATTACA TCAGGATCCG 
GTGGAGCCAT CAGAGGAATC AAACACATTA TTATAGTACC AATCCCTGGA GATTCCTCAA 
TTACCACTCG ATCCAGACTT CTGGACCGGT TGGTCAGGTT AATTGGAAAC CCGGATGTGA 
GCGGGCCCAA ACTAACAGGG GCACTAATAG GTATATTATC CTTATTTGTG GAGTCTCCAG 
GTCAATTGAT TCAGAGGATC ACCGATGACC CTGACGTTAG CATAAGGCTG TTAGAGGTTG 
TCCAGAGTGA CCAGTCACAA TCTGGCCTTA CCTTCGCATC AAGAGGTACC AACATGGAGG 
ATGAGGCGGA CCAATACTTT TCACATGATG ATCCAATTAG TAGTGATCAA TCCAGGTTCG 
GATGGTTCGA GAACAAGGAA ATCTCAGATA TTGAAGTGCA AGACCCTGAG GGATTCAACA 
TGATTCTGGG TACCATCCTA GCTCAAATTT GGGTCTTGCT CGCAAAGGCG GTTACGGCCC 
CAGACACGGC AGCTGATTCG GAGCTAAGAA GGTGGATAAA GTACACCCAA CAAAGAAGGG 
TAGTTGGTGA ATTTAGATTG GAGAGAAAAT GGTTGGATGT GGTGAGGAAC AGGATTGCCG 
AGGACCTCTC CTTACGCCGA TTCATGGTCG CTCTAATCCT GGATATCAAG AGAACACCCG 
GAAACAAACC CAGGATTGCT GAAATGATAT GTGACATTGA TACATATATC GTAGAGGCAG 
GATTAGCCAG TTTTATCCTG ACTATTAAGT TTGGGATAGA AACTATGTAT CCTGCTCTTG 
GACTGCATGA ATTTGCTGGT GAGTTATCCA CACTTGAGTC CTTGATGAAC CTTTACCAGC 
AAATGGGGGA AACTGCACCC TACATGGTAA TCCTGGAGAA CTCAATTCAG AACAAGTTCA 
GTGCAGGATC ATACCCTCTG CTCTGGAGCT ATGCCATGGG AGTAGGAGTG GAACTTGAAA 
ACTCCATGGG AGGTTTGAAC TTTGGCCGAT CTTACTTTGA TCCAGCATAT TTTAGATTAG 
GGCAAGAGAT GGTAAGGAGG TCAGCTGGAA AGGTCAGTTC CACATTGGCA TCTGAACTCG 
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GTATCACTGC CGAGGATGCA AGGCTTGTTT CAGAGATTGC AATGCATACT ACTGAGGACA 
AGATCAGTAG AGCGGTTGGA CCCAGACAAG CCCAAGTATC ATTTCTACAC GGTGATCAAA 
GTGAGAATGA GCTACCGAGA TTGGGGGGCA AGGAAGATAG GAGGGTCAAA CAGAGTCGAG 
GAGAAGCCAG GGAGAGCTAC AGAGAAACCG GGCCCAGCAG AGCAAGTGAT GCGAGAGCTG 
CCCATCTTCC AACCGGCACA CCCCTAGACA TTGACACTGC ATCGGAGTCC AGCCAAGATC 
CGCAGGACAG TCGAAGGTCA GCTGACGCCC TGCTTAGGCT GCAAGCCATG GCAGGAATCT 
CGGAAGAACA AGGCTCAGAC ACGGACACCC CTATAGTGTA CAATGACAGA AATCTTCTAG 
ACTAGGTGCG AGAGGCCGAG GGCCAGAACA ACATCCGCCT ACCCTCCATC ATTGTTATAA 
AAAACTTAGG AACCAGGTCC ACACAGCCGC CAGCCCATCA ACCATCCACT CCCACGATTG 
GAGCCGATGG CAGAAGAGCA GGCACGCCAT GTCAAAAACG GACTGGAATG CATCCGGGCT 
CTCAAGGCCG AGCCCATCGG CTCACTGGCC ATCGAGGAAG CTATGGCAGC ATGGTCAGAA 
ATATCAGACA ACCCAGGACA GGAGCGAGCC ACCTGCAGGG AAGAGAAGGC AGGCAGTTCG 
GGTCTCAGCA AACCATGCCT CTCAGCAATT GGATCAACTG AAGGCGGTGC ACCTCGCATC 
CGCGGTCAGG GACCTGGAGA GAGCGATGAC GACGCTGAAA CTTTGGGAAT CCCCCCAAGA 
AATCTCCAGG CATCAAGCAC TGGGTTACAG TGTTATTATG TTTATGATCA GAGCGGTGAA 
GCGGTTAAGG GAATCCAAGA TGCTGACTCT ATCATGGTTC AATCAGGCCT TGATGGTGAT 
AGCACCCTCT CAGGAGGAGA CAATGAATCT GAAAACAGCG ATGTGGATAT TGGCGAACCT 
GATACCGAGG GATATGCTAT CACTGACCGG GGATCTGCTC CCATCTCTAT GGGGTTCAGG 
GCTTCTGATG TTGAAACTGC AGAAGGAGGG GAGATCCACG AGCTCCTGAG ACTCCAATCC 
AGAGGCAACA ACTTTCCGAA GCTTGGGAAA ACTCTCAATG TTCCTCCGCC TCCGGACCCC 
GGTAGGGCCA GCACTTCCGG GACACCCATT AAAAAGGGCA CAGACGCGAG ATTAGCCTCA 
TTTGGAACGG AGATCGCGTC TTTATTGACA GGTGGTGCAA CCCAATGTGC TCGAAAGTCA 
CCCTCGGAAC CATCAGGGCC AGGTGCACCT GCGGGG AATG TCCCCGAGTG TGTGAGCAAT 
GCCGCACTGA TACAGGAGTG GACACCCGAA TCTGGTACCA CAATCTCCCC GAGATCCCAG 
AATAATGAAG AAGGGGGAGA CTATTATGAT GATGAGCTGT TCTCTGATGT CCAAGATATT 
AAAACAGCCT TGGCCAAAAT ACACGAGGAT AATCAGAAGA TAATCTCCAA GCTAGAATCA 
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CTGCTGTTAT 


TGAAGGGAGA 


AGTTGAGTCA 


ATTAAGAAGC 


AGATCAACAG 


GCAAAATATC 


2880 


AGCATATCCA 


CCCTGGAAGG 


ACACCTCTCA AGCATCATGA 


TCGCCATTCC 


TGGACTTGGG 


2940 


AAGGATCCCA 


ACGACCCCAC 


TGCAGATGTC 


GAAATCAATC 


CCGACTTGAA 


ACCCATCATA 


3000 


GGCAGAGATT 


CAGGCCGAGC 


ACTGGCCGAA 


GTTCTCAAGA 


AACCCGTTGC 


CAGCCGACAA 


3060 


CTCCAAGGAA 


TGACAAATGG 


ACGGACCAGT 


TCCAGAGGAC 


AGCTGCTGAA 


GGAATTTCAG 


3120 


CTAAAGCCGA 


TCGGGAAAAA 


GATGAGCTCA 


GCCGTCGGGT 


TTGTTCCTGA 


CACCGGCCCT 


3180 


GCATCACGCA 


GTGTAATCCG 


CTCCATTATA 


AAATCCAGCC 


GGCTAGAGGA 


GGATCGGAAG 


3240 


CGTTACCTGA 


TGACTCTCCT 


TGATGATATC 


AAAGGAGCCA 


ATGATCTTGC 


CAAGTT CCAC 


3300 


CAGATGCTGA 


TGAAGATAAT 


AATGAAGTAG 


CTACAGCTCA 


ACTTACCTGC 


CAACCCCATG 


3360 


CCAGTCGACC 


CAACTAGTAC 


AACCTAAATC 


CATTATAAAA 


AACTTAGGAG 


CAAAGTGATT 


3420 


GCCTCCCAAG 


TTCCACAATG 


ACAGAGATCT 


ACGACTTCGA 


CAAGTCGGCA 


TGGGACATCA 


3480 


AAGGGTTGAT 


CGCTCCGATA 


CAACCCACCA 


CCTACAGTGA 


TGGCAGGCTG 


GTGCCCCAGG 


3540 


TCAGAGTCAT 


AGATCCTGGT 


CTAGGCGACA 


GGAAGGATGA 


ATGCTTTATG 


TACATGTTTC 


3600 


TGCTGGGGGT 


TGTTGAGGAC 


AGCGATCCCC 


TAGGGCCTCC 


AATCGGGCGA 


GCATTTGGGT 


3660 


CCCTGCCCTT 


AGGTGTTGGC 


AAATCCACAG 


CAAAGCCCGA 


AAAACTCCTC 


AAAGAGGCCA 


3720 


CTGAGCTTGA 


CATAGTTGTT 


AGACGTACAG 


CAGGGCTCAA 


TGAAAAACTG 


GTGTTCTACA 


3780 


ACAACACCCC 


ACTAACTCTC 


CTCACACCTT 


GGAGAAAGGT 


CCTAACAACA 


GGGAGTGTCT 


3840 


TCAACGCAAA 


CCAAGTGTGC 


AGTGCGGTTA 


ATCTGATACC 


GCTCGATACC 


CCGCAGAGGT 


3900 


TCCGTGTTGT 


TTATATGAGC 


ATCACCCGTC 


TTTCGGATAA 


CGGGTATTAC 


ACCGTTCCTA 


3960 


GAAGAATGCT 


GGAATTCAGA 


TCGGTCAATG 


CAGTGGCCTT 


CAACCTGCTG 


GTGACCCTTA 


4020 


GGATTGACAA 


GGCGATAGGC 


CCTGGGAAGA 


TCATCGACAA 


TACAGAGCAA 


CTTCCTGAGG 


4080 


CAACATTTAT 


GGTCCACATC 


GGGAACTTCA 


GGAGAAAGAA 


GAGTGAAGTC 


TACTCTGCCG 


4140 


ATTATTGCAA 


AATGAAAATC 


GAAAAGATGG 


GCCTGGTTTT 


TGCACTTGGT 


GGGATAGGGG 


4200 


GCACCAGTCT 


TCACATTAGA 


AGCACAGGCA 


AAATGAGCAA 


GACTCTCCAT 


GCACAACTCG 


4260 


GGTTCAAGAA 


GACCTTATGT 


TACCCGCTGA 


TAGATATCAA 


TGAAGACCTT 


AATCGATTAC 


4320 


TCTGGAGGAG 


CAGATGCAAG 


ATAGTAAGAA 


TCCAGGCAGT 


TTTGCAGCCA 


TCAGTTCCTC 


4380 
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AAGAATTCCG CATTTACGAC 6AC6T6ATCA TAAAT6ATGA CCAAGGACTA TTCAAAGTTC 
TGTAGACCGT AGTGCCCAGC AATGCCCGAA AACGACCCCC CTCACAATGA CAGCCAGAAG 
GCCCGGACAA AAAAGCCCCC TCCGAAAGAC TCCACGGACC AAGCGAGAGG CCAGCCAGCA 
GCCGACGGCA AGCGCGAACA CCAGGCGGCC CCAGCACAGA ACAGCCCTGA TACAAGGCCA 
CCACCAGCCA CCCCAATCTG CATCCTCCTC GTGGGACCCC CGAGGACCAA CCCCCAAGGC 
TGCCCCCGAT CCAAACCACC AACCGCATCC CCACCACCCC CGGGAAAGAA ACCCCCAGCA 
ATTGGAAGGC CCCTCCCCCT CTTCCTCAAC ACAAGAACTC CACAACCGAA CCGCACAAGC 
GACCGAGGTG ACCCAACCGC AGGCATCCGA CTCCCTAGAC AGATCCTCTC TCCCCGGCAA 
ACTAAACAAA ACTTAGGGCC AAGGAACATA CACACCCAAC AGAACCCAGA CCCCGGCCCA 
CGGCGCCGCG CCCCCAACCC CCGACAACCA GAGGGAGCCC CCAACCAATC CCGCCGGCTC 
CCCCGGTGCC CACAGGCAGG GACACCAACC CCCGAACAGA CCCAGCACCC AACCATCGAC 
AATCCAAGAC GGGGGGGCCC CCCCAAAAAA AGGCCCCCAG GGGCCGACAG CCAGCACCGC 
GAGGAAGCCC ACCCACCCCA CACACGACCA CGGCAACCAA ACCAGAACCC AGACCACCCT 
GGGCCACCAG CTCCCAGACT CGGCCATCAC CCCGCAGAAA GGAAAGGCCA CAACCCGCGC 
ACCCCAGCCC CGATCCGGCG GGGAGCCACC CAACCCGAAC CAGCACCCAA GAGCGATCCC 
CGAAGGACCC CCGAACCGCA AAGGACATCA GTATCCCACA GCCTCTCCAA GTCCCCCGGT 
CTCCTCCCCT TCTCGAAGGG ACCAAAAGAT CAATCCACCA CACCCGACGA CACTCAACTC 
CCCACCCCTA AAGGAGACAC CGGGAATCCC AGAATCAAGA CTCATCCAAT GTCCATCATG 
GGTCTCAAGG TGAACGTCTC TGCCATATTC ATGGCAGTAC TGTTAACTCT CCAAACACCC 
ACCGGTCAAA TCCATTGGGG CAATCTCTCT AAGATAGGGG TGGTAGGAAT AGGAAGTGCA 
AGCTACAAAG TTATGACTCG TTCCAGCCAT CAATCATTAG TCATAAAATT AATGCCCAAT 
ATAACTCTCC TCAATAACTG CACGAGGGTA GAGATTGCAG AATACAGGAG ACTACTGAGA 
ACAGTTTTGG AACCAATTAG AGATGCACTT AATGCAATGA CCCAGAATAT AAGACCGGTT 
CAGAGTGTAG CTTCAAGTAG GAGACACAAG AGATTTGCGG GAGTAGTCCT GGCAGGTGCG 
GCCCTAGGCG TTGCCACAGC TGCTCAGATA ACAGCCGGCA TTGCACTTCA CCAGTCCATG 
CTGAACTCTC AAGCCATCGA CAATCTGAGA GCGAGCCTGG AAACTACTAA TCAGGCAATT 
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GAGGCAATCA GACAAGCAGG GCAGGAGATG ATATTGGCTG TTCAGGGTGT CCAAGACTAC 
ATCAATAATG AGCTGATACC GTCTATGAAC CAACTATCTT GTGATTTAAT CGGCCAGAAG 
CTCGGGCTCA AATTGCTCAG ATACTATACA GAAATCCTGT CATTATTTGG CCCCAGCTTA 
CGGGACCCCA TATCTGCGGA GATATCTATC CAGGCTTTGA GCTATGCGCT TGGAGGAGAC 
ATCAATAAGG TGTTAGAAAA GCTCGGATAC AGTGGAGGTG ATTTACTGGG CATCTTAGAG 
AGCAGAGGAA TAAAGGCCCG GATAACTCAC GTCGACACAG AGTCCTACTT CATTGTCCTC 
AGTATAGCCT ATCCGACGCT GTCCGAGATT AAGGGGGTGA TTGTCCACCG GCTAGAGGGG 
GTCTCGTACA ACATAGGCTC TCAAGAGTGG TATACCACTG TGCCCAAGTA TGTTGCAACC 
CAAGGGTACC TTATCTCGAA TTTTGATGAG TCATCGTGTA CTTTCATGCC AGAGGGGACT 
GTGTGCAGCC AAAATGCCTT GTACCCGATG AGTCCTCTGC TCCAAGAATG CCTCCGGGGG 
TCCACCAAGT CCTGTGCTCG TACACTCGTA TCCGGGTCTT TTGGGAACCG GTTCATTTTA 
TCACAAGGGA ACCTAATAGC CAATTGTGCA TCAATCCTTT GCAAGTGTTA CACAACAGGA 
ACGATCATTA ATCAAGACCC TGACAAGATC CTAACATACA TTGCTGCCGA TCACTGCCCG 
GTAGTCGAGG TGAACGGCGT GACCATCCAA GTCGGGAGCA GGAGGTATCC AGATGCTGTG 
TACTTGCACA GAATTGACCT CGGTCCTCCC ATATCATTGG AGAGGTTGGA CGTAGGGACA 
AATCTGGGGA ATGCAATTGC TAAGTTGGAG GATGCCAAGG AATTGTTGGA GTCATCGGAC 
CAGATATTGA GGAGTATGAA AGGTTTATCG AGCACTAGCA TAGTCTACAT CCTGATTGCA 
GTGTGTCTTG GAGGGTTGAT AGGGATCCCC GCTTTAATAT GTTGCTGCAG GGGGCGTTGT 
AACAAAAAGG GAGAACAAGT TGGTATGTCA AGACCAGGCC TAAAGCCTGA TCTTACGGGA 
ACATCAAAAT CCTATGTAAG GTCGCTCTGA TCCTCTACAA CTCTTGAAAC ACAAATGTCC 
CACAAGTCTC CTCTTCGTCA TCAAGCAACC ACCGCACCCA GCATCAAGCC CACCTGAAAT 
TATCTCCGGC TTCCCTCTGG CCGAACAATA TCGGTAGTTA ATTAAAACTT AGGGTGCAAG 
ATCATCCACA ATGTCACCAC AACGAGACCG GATAAATGCC TTCTACAAAG ATAACCCCCA 
TCCCAAGGGA AGTAGGATAG TCATTAACAG AGAACATCTT ATGATTGATA GACCTTATGT 
TTTGCTGGCT GTTCTGTTTG TCATGTTTCT GAGCTTGATC GGGTTGCTAG CCATTGCAGG 
CATTAGACTT CATCGGGCAG CCATCTACAC CGCAGAGATC CATAAAAGCC TCAGCACCAA 
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TCTAGATGTA ACTAACTCAA TCGAGCATCA GGTCAAGGAC GTGCTGACAC CACTCTTCAA 
AATCATCGGT GATGAAGTGG GCCTGAGGAC ACCTCAGAGA TTCACTGACC TAGTGAAATT 
CATCTCTGAC AAGATTAAAT TCCTTAATCC GGATAGGGAG TACGACTTCA GAGATCTCAC 
TTGGTGTATC AACCCGCCAG AGAGAATCAA ATTGGATTAT GATCAATACT GTGCAGATGT 
GGCTGCTGAA GAGCTCATGA ATGCATTGGT GAACTCAACT CTACTGGAGA CCAGAACAAC 
CAATCAGTTC CTAGCTGTCT CAAAGGGAAA CTGCTCAGGG CCCACTACAA TCAGAGGTCA 
ATTCTCAAAC ATGTCGCTGT CCCTGTTAGA CTTGTATTTA GGTCGAGGTT ACAATGTGTC 
ATCTATAGTC ACTATGACAT CCCAGGGAAT GTATGGGGGA ACTTACCTAG TGGAAAAQCC 
TAATCTGAGC AGCAAAAGGT CAGAGTTGTC ACAACTGAGC ATGTACCGAG TGTTTGAAGT 
AGGTGTTATC AGAAATCCGG GTTTGGGGGC TCCGGTGTTC CATATGACAA ACTATCTTGA 
GCAACCAGCC AGTAATGATC TCAGCAACTG TATGGTGGCT TTGGGGGAGC TCAAACTCGC 
AGCCCTTTGT CACGGGGAAG ATTCTATCAC AATTCCCTAT CAGGGATCAG GGAAAGGTGT 
CAGCTTCCAG CTCGTCAAGC TAGGTGTCTG GAAATCCCCA ACCGACATGC AATCCTGGGT 
CCCCTTATCA ACGGATGATC CAGTGATAGA CAGGCTTTAC CTCTCATCTC ACAGAGGTGT 
TATCGCTGAC AATCAAGCAA AATGGGCTGT CCCGACAACA CGAACAGATG ACAAGTTGCG 
AATGGAGACA TGCTTCCAAC AGGCGTGTAA GGGTAAAATC CAAGCACTCT GCGAGAATCC 
CGAGTGGGCA CCATTGAAGG ATAACAGGAT TCCTTCATAC GGGGTCTTGT CTGTTGATCT 
GAGTCTGACA GTTGAGCTTA AAATCAAAAT TGCTTCGGGA TTCGGGCCAT TGATCACACA 
CGGTTCAGGG ATGGACCTAT ACAAATCCAA CCACAACAAT GTGTATTGGC TGACTATCCC 
GCCAATGAAG AACCTAGCCT TAGGTGTAAT CAACACATTG GAGTGGATAC CGAGATTCAA 
GGTTAGTCCC TACCTCTTCA ATGTCCCAAT TAAGGAAGCA GGCGAAGACT GCCATGCCCC 
AACATACCTA CCTGCGGAGG TGGATGGTGA TGTCAAACTC AGTTCCAATC TGGTGATTCT 
ACCTGGTCAA GATCTCCAAT ATGTTTTGGC AACCTACGAT ACTTCCAGGG TTGAACATGC 
TGTGGTTTAT TACGTTTACA GCCCAGGCCG CTCATTTTCT TACTTTTATC CTTTTAGGTT 
GCCTATAAAG GGGGTCCCCA TCGAATTACA AGTGGAATGC TTCACATGGG ACCAAAAACT 
CTGGTGCCGT CACTTCTGTG TGCTTGCGGA CTCAGAATCT GGTGGACATA TCACTCACTC 
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TGGGATGGTG GGCATGGGAG TCAGCTGCAC AGTCACCCGG GAAGATGGAA CCAATCGCAG 
ATAGGGCTGC TAGTGAACCA ATCTCATGAT GTCACCCAGA CATCAGGCAT ACCCACTAGT 
GTGAAATAGA CATCAGAATT AAGAAAAACG TAGGGTCCAA GTGGTTCCCC GTTATGGACT 
CGCTATCTGT CAACCAGATC TTATACCCTG AAGTTCACCT AGATAGCCCG ATAGTTACCA 
ATAAGATAGT AGCCATCCTG GAGTATGCTC GAGTCCCTCA CGCTTACAGC CTGGAGGACC 
CTACACTGTG TCAGAACATC AAGCACCGCC TAAAAAACGG ATTTTCCAAC CAAATGATTA 
TAAACAATGT GGAAGTTGGG AATGTCATCA AGTCCAAGCT TAGGAGTTAT CCGGCCCACT 
CTCATATTCC ATATCCAAAT TGTAATCAGG ATTTATTTAA CATAGAAGAC AAAGAGTCAA 
CGAGGAAGAT CCGTGAACTC CTCAAAAAGG GGAATTCGCT GTACTCCAAA GTCAGTGATA 
AGGTTTTCCA ATGCTTAAGG GACACTAACT CACGGCTTGG CCTAGGCTCC GAATTGAGGG 
AGGACATCAA GGAGAAAGTT ATTAACTTGG GAGTTTACAT GCACAGCTCC CAGTGGTTTG 
AGCCCTTTCT GTTTTGGTTT ACAGTCAAGA CTGAGATGAG GTCAGTGATT AAATCACAAA 
CCCATACTTG CCATAGGAGG AGACACACAC CTGTATTCTT CACTGGTAGT TCAGTTGAGT 
TGCTAATCTC TCGTGACCTT GTTGCTATAA TCAGTAAAGA GTCTCAACAT GTATATTACC 
TGACATTTGA ACTGGTTTTG ATGTATTGTG ATGTCATAGA GGGGAGGTTA ATGACAGAGA 
CCGCTATGAC TATTGATGCT AGGTATACAG AGCTTCTAGG AAGAGTCAGA TACATGTGGA 
AACTGATAGA TGGTTTCTTC CCTGCACTCG GGAATCCAAC TTATCAAATT GTAGCCATGC 
TGGAGCCTCT TTCACTTGCT TACCTGCAGC TGAGGGATAT AACAGTAGAA CTCAGAGGTG 
CTTTCCTTAA CCACTGCTTT ACTGAAATAC ATGATGTTCT TGACCAAAAC GGGTTTTCTG 
ATGAAGGTAC TTATCATGAG TTAATTGAAG CTCTAGATTA CATTTTCATA ACTGATGACA 
TACATCTGAC AGGGGAGATT TTCTCATTTT TCAGAAGTTT CGGCCACCCC AGACTTGAAG 
CAGTAACGGC TGCTGAAAAT GTTAGGAAAT ACATGAATCA GCCTAAAGTC ATTGTGTATG 
AGACTCTGAT GAAAGGTCAT GCCATATTTT GTGGAATCAT AATCAACGGC TATCGTGACA 
GGCACGGAGG CAGTTGGCCA CCGCTGACCC TCCCCCTGCA TGCTGCAGAC ACAATCCGGA 
ATGCTCAAGC TTCAGGTGAA GGGTTAACAC ATGAGCAGTG CGTTGATAAC TGGAAATCTT 
TTGCTGGAGT GAAATTTGGC TGCTTTATGC CTCTTAGCCT GGATAGTGAT CTGACAATGT 
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ACCTAAAGGA CAAGGCACTT GCTGCTCTCC AAA666AATG GGATTCAGTT TACCCGAAAG 
AGTTCCTGCG TTACGACCCT CCCAAGGGAA CCGGGTCACG GAGGCTTGTA GATGTTTTCC 
TTAATGATTC GAGCTTTGAC CCATATGATG TGATAATGTA TGTTGTAAGT GGAGCTTACC 
TCCATGACCC TGAGTTCAAC CTGTCTTACA GCCTGAAAGA AAAGGAGATC AAGGAAACAG 
GTAGACTTTT TGCTAAAATG ACTTACAAAA TGAGGGCATG CCAAGTGATT GCTGAAAATC 
TAATCTCAAA CGGGATTGGC AAATATTTTA AGGACAATGG GATGGCCAAG GATGAGCACG 
ATTTGACTAA GGCACTCCAC ACTCTAGCTG TCTCAGGAGT CCCCAAAGAT CTCAAAGAAA 
GTCACAGGGG GGGGCCAGTC TTAAAAACCT ACTCCCGAAG CCCAGTCCAC ACAAGTACCA 
GGAACGTGAG AGCAGCAAAA GGGTTTATAG GGTTCCCTCA AGTAATTCGG CAGGACCAAG 
ACACTGATCA TCCGGAGAAT ATGGAAGCTT ACGAGACAGT CAGTGCATTT ATCACGACTG 
ATCTCAAGAA GTACTGCCTT AATTGGAGAT ATGAGACCAT CAGCTTGTTT GCACAGAGGC 
TAAATGAGAT TTACGGATTG CCCTCATTTT TCCAGTGGCT GCATAAGAGG CTTGAGACCT 
CTGTCCTGTA TGTAAGTGAC CCTCATTGCC CCCCCGACCT TGACGCCCAT ATCCCGTTAT 
ATAAAGTCCC CAATGATCAA ATCTTCATTA AGTACCCTAT GGGAGGTATA GAAGGGTATT 
GTCAGAAGCT GTGGACCATC AGCACCATTC CCTATCTATA CCTGGCTGCT TATGAGAGCG 
GAGTAAGGAT TGCTTCGTTA GTGCAAGGGG ACAATCAGAC CATAGCCGTA ACAAAAAGGG 
TACCCAGCAC ATGGCCCTAC AACCTTAAGA AACGGGAAGC TGCTAGAGTA ACTAGAGATT 
ACTTTGTAAT TCTTAGGCAA AGGCTACATG ATATTGGCCA TCACCTCAAG GCAAATGAGA 
CAATTGTTTC ATCACATTTT TTTGTCTATT CAAAAGGAAT ATATTATGAT GGGCTACTTG 
TGTCCCAATC ACTCAAGAGC ATCGCAAGAT GTGTATTCTG GTCAGAGACT ATAGTTGATG 
AAACAAGGGC AGCATGCAGT AATATTGCTA CAACAATGGC TAAAAGCATC GAGAGAGGTT 
ATGACCGTTA CCTTGCATAT TCCCTGAACG TCCTAAAAGT GATACAGCAA ATTCTGATCT 
CTCTTGGCTT CACAATCAAT TCAACCATGA CCCGGGATGT AGTCATACCC CTCCTCACAA 
ACAACGACCT CTTAATAAGG ATGGCACTGT TGCCCGCTCC TATTGGGGGG ATGAATTATC 
TGAATATGAG CAGGCTGTTT GTCAGAAACA TCGGTGATCC AGTAACATCA TCAATTGCTG 
ATCTCAAGAG AATGATTCTC GCCTCACTAA TGCCTGAAGA GACCCTCCAT CAAGTAATGA 
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CACAACAACC GGGGGACTCT TCATTCCTAG ACT6G6CTAG CGACCCTTAC TCAGCAAATC 
TTGTATGTGT CCAGAGCATC ACTAGACTCC TCAAGAACAT AACTGCAAGG TTTGTCCTGA 
TCCATAGTCC AAACCCAATG TTAAAAGGAT TATTCCATGA TGACAGTAAA GAAGAGGACG 
AGGGACTGGC GGCATTCCTC ATGGACAGGC ATATTATAGT ACCTAGGGCA GCTCATGAAA 
TCCTGGATCA TAGTGTCACA GGGGCAAGAG AGTCTATTGC AGGCATGCTG GATACCACAA 
AAGGCCTGAT TCGAGCCAGC ATGAGGAAGG GGGGGTTAAC CTCTCGAGTG ATAACCAGAT 
TGTCCAATTA TGACTATGAA CAATTCAGAG CAGGGATGGT GCTATTGACA GGAAGAAAGA 
GAAATGTCCT CATTGACAAA GAGTCATGTT CAGTGCAGCT GGCGAGAGCT CTAAGAAGCC 
ATATGTGGGC GAGGCTAGCT CGAGGACGGC CTATTTACGG CCTTGAGGTC CCTGATGTAC 
TAGAATCTAT GCGAGGCCAC CTTATTCGGC GTCATGAGAC ATGTGTCATC TGCGAGTGTG 
GATCAGTCAA CTACGGATGG TTTTTTGTCC CCTCGGGTTG CCAACTGGAT GATATTGACA 
AGGAAACATC ATCCTTGAGA GTCCCATATA TTGGTTCTAC CACTGATGAG AGAACAGACA 
TGAAGCTTGC CTTCGTAAGA GCCCCAAGTC GATCCTTGCG ATCTGCTGTT AGAATAGCAA 
CAGTGTACTC ATGGGCTTAC GGTGATGATG ATAGCTCTTG GAACGAAGCC TGGTTGTTGG 
CTAGGCAAAG GGCCAATGTG AGCCTGGAGG AGCTAAGGGT GATCACTCCC ATCTCAACTT 
CGACTAATTT AGCGCATAGG TTGAGGGATC GTAGCACTCA AGTGAAATAC TCAGGTACAT 
CCCTTGTCCG AGTGGCGAGG TATACCACAA TCTCCAACGA CAATCTCTCA TTTGTCATAT 
CAGATAAGAA GGTTGATACT AACTTTATAT ACCAACAAGG AATGCTTCTA GGGTTGGGTG 
TTTTAGAAAC ATTGTTTCGA CTCGAGAAAG ATACCGGATC ATCTAACACG GTATTACATC 
TTCACGTCGA AACAGATTGT TGCGTGATCC CGATGATAGA TCATCCCAGG ATACCCAGCT 
CCCGCAAGCT AGAGCTGAGG GCAGAGCTAT GTACCAACCC ATTGATATAT GATAATGCAC 
CTTTAATTGA CAGAGATACA ACAAGGCTAT ACACCCAGAG CCATAGGAGG CACCTTGTGG 
AATTTGTTAC ATGGTCCACA CCCCAACTAT ATCACATTTT AGCTAAGTCC ACAGCACTAT 
CTATGATTGA CCTGGTAACA AAATTTGAGA AGGACCATAT GAATGAAATT TCAGCTCTCA 
TAGGGGATGA CGATATCAAT AGTTTCATAA CTGAGTTTCT GCTCATAGAG CCAAGATTAT 
TCACTATCTA CTTGGGCCAG TGTGCGGCCA TCAATTGGGC ATTTGATGTA CATTATCATA 
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GACCATCAGG GAAATATCAG ATGGGTGAGC TGTTGTCATC GTTCCTTTCT AGAATGAGCA 
AAGGAGTGTT TAAGGTGCTT GTCAATGCTC TAAGCCACCC AAAGATCTAC AAGAAATTCT 
GGCATTGTGG TATTATAGAG CCTATCCATG GTCCTTCACT TGATGCTCAA AACTTGCACA 
CAACTGTGTG CAACATGGTT TACACATGCT ATATGACCTA CCTCGACCTG TTGTTGAATG 
AAGAGTTAGA AGAGTTCACA TTTCTCTTGT GTGAAAGCGA CGAGGATGTA GTACCGGACA 
GATTCGACAA CATCCAGGCA AAACACTTAT GTGTTCTGGC AGATTTGTAC TGTCAACCAG 
GOACCTGCCC ACCAATTCGA GGTCTAAGAC CGGTAGAGAA ATGTGCAGTT CTAACCGACC 
ATATCAAGGC AGAGGCTAGG TTATCTCCAG CAGGATCTTC GTGGAACATA AATCCAATTA 
TTGTAGACCA TTACTCATGC TCTCTGACTT ATCTCCGGCG AGGATCGATC AAACAGATAA 
GATTGAGAGT TGATCCAGGA TTCATTTTCG ACGCCCTCGC TGAGGTAAAT GTCAGTCAGC 
CAAAGATCGG CAGCAACAAC ATCTCAAATA TGAGCATCAA GGCTTTCAGA CCCCCACACG 
ATGATGTTGC AAAATTGCTC AAAGATATCA ACACAAGCAA GCACAATCTT CCCATTTCAG 
GGGGCAATCT CGCCAATTAT GAAATCCATG CTTTCCGCAG AATCGGGTTG AACTCATCTG 
CTTGCTACAA AGCTGTTGAG ATATCAACAT TAATTAGGAG ATGCCTTGAG CCAGGGGAGG 
ACGGCTTGTT CTTGGGTGAG GGATCGGGTT CTATGTTGAT CACTTATAAG GAGATACTTA 
AACTAAACAA GTGCTTCTAT AATAGTGGGG TTTCCGCCAA TTCTAGATCT GGTCAAAGGG 
AATTAGCACC CTATCCCTCC GAAGTTGGCC TTGTCGAACA CAGAATGGGA GTAGGTAATA 
TTGTCAAAGT GCTCTTTAAC GGGAGGCCCG AAGTCACGTG GGTAGGCAGT GTAGATTGCT 
TCAATTTCAT AGTTAGTAAT ATCCCTACCT CTAGTGTGGG GTTTATCCAT TCAGATATAG 
AGACCTTGCC TAACAAAGAT ACTATAGAGA AGCTAGAGGA ATTGGCAGCC ATCTTATCGA 
TGGCTCTGCT CCTGGGCAAA ATAGGATCAA TACTGGTGAT TAAGCTTATG CCTTTCAGCG 
GGGATTTTGT TCAGGGATTT ATAAGTTATG TAGGGTCCCA TTATAGAGAA GTGAACCTTG 
TATACCCTAG ATACAGCAAC TTCATATCTA CTGAATCTTA TTTGGTTATG ACAGATCTCA 
AGGCTAACCG GCTAATGAAT CCTGAAAAGA TTAAGCAGCA GATAATTGAA TCATCTGTGA 
GGACTTCACC TGGACTTATA GGTCACATCC TATCCATTAA GCAACTAAGC TGCATACAAG 
CAATTGTGGG AGACGCAGTT AGTAGAGGTG ATATCAATCC TACTCTGAAA AAACTTACAC 
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CTATAGAGCA GGTGCTGATC AATTGCGGGT TGGCAATTAA 
AATTGATCCA CCATGATGTT GCCTCAGGGC AAGATGGATT 
TCTACAGGGA GTTGGCAAGA TTCAAAGACA ACCAAAGAAG 
CTTACCCCGT ATTGGTAAGT AGCAGGCAAC GAGAACTTAT 
TTTGGGGGCA CATTCTTCTT TACTCCGGGA ACAGAAAGTT 
ATCTCAAGTC CGG CTATCTG ATACTAGACT TACACCAGAA 
CCAAGTCAGA GAAACAGATT ATTATGACGG GGGGTTTGAA 
TAACAGTCAA GGAGACCAAA GAATGGTATA AGTTAGTCGG 
ACTAATTGGT TGAACTCCGG AACCCTAATC CTGCCCTAGG 
TAGATTAAAG AAAACTTTGA AAATACGAAG TTTCTATTCC 
(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2183 amino acids 

(B) TYPE: amino acid 

(C) STRAND EDNESS : 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



CGGACCTAAG 
GCTTAATTCT 
TCAACAAGGG 
ATCTAGGATC 
GATAAATAAG 
TATCTTCGTT 
ACGTGAGTGG 
ATACAGTGCC 
TGGTTAGGCA 
CAGCTTTGTC 



CTGTGCAAAG 
ATACTCATCC 
ATGTTCCACG 
ACCCGCAAAT 
TTTATCCAGA 
AAGAATCTAT 
GTTTTTAAGG 
CTGATTAAGG 
TTATTTGCAA 
TGGT 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 

Met Asp Ser Leu Ser Val Asn Gin He Leu Tyr Pro Glu Val His Leu 
15 10 15 

Asp Ser Pro He Val Thr Asn Lys He Val Ala He Leu Glu Tyr Ala 
20 25 30 

Arg Val Pro His Ala Tyr Ser Leu Glu Asp Pro Thr Leu Cys Gin Asn 
35 40 45 

He Lys His Arg Leu Lys Asn Gly Phe Ser Asn Gin Met He lie Asn 
50 55 60 

Asn Val Glu Val Gly Asn Val He Lys Ser Lys Leu Arg Ser Tyr Pro 
65 70 75 80 

Ala His Ser His He Pro Tyr Pro Asn Cys Asn Gin Asp Leu Phe Asn 
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85 90 95 

lie Glu Asp Lys Glu Ser Thr Arg Lys lie Arg Glu Leu Leu Lys Lys 
100 105 110 

Gly Asn Ser Leu Tyr Ser Lye Val Ser Asp Lys Val Phe Gin Cys Leu 
115 120 125 

Arg Asp Thr Asn Ser Arg Leu Gly Leu Gly Ser Glu Leu Arg Glu Asp 
130 135 140 

lie Lys Glu Lys Val lie Asn Leu Gly Val Tyr Met His Ser Ser Gin 
145 150 155 160 

Trp Phe Glu Pro Phe Leu Phe Trp Phe Thr Val Lys Thr Glu Met Arg 
165 170 175 

Ser Val lie Lys Ser Gin Thr His Thr Cys His Arg Arg Arg His Thr 
180 185 190 

Pro Val Phe Phe Thr Gly Ser Ser Val Glu Leu Leu lie Ser Arg Asp 
195 200 205 

Leu Val Ala lie lie Ser Lys Glu Ser Gin His Val Tyr Tyr Leu Thr 
210 215 220 

Phe Glu Leu Val Leu Met Tyr Cys Asp Val lie Glu Gly Arg Leu Met 
225 230 235 240 

Thr Glu Thr Ala Met Thr lie Asp Ala Arg Tyr Thr Glu Leu Leu Gly 
245 250 255 

Arg Val Arg Tyr Met Trp Lys Leu He Asp Gly Phe Phe Pro Ala Leu 
260 265 270 

Gly Asn Pro Thr Tyr Gin He Val Ala Met Leu Glu Pro Leu Ser Leu 
275 280 285 

Ala Tyr Leu Gin Leu Arg Asp He Thr Val Glu Leu Arg Gly Ala Phe 
290 295 300 

Leu Asn His Cys Phe Thr Glu He His Asp Val Leu Asp Gin Asn Gly 
305 310 315 320 

Phe Ser Asp Glu Gly Thr Tyr His Glu Leu He Glu Ala Leu Asp Tyr 
325 330 335 

He Phe He Thr Asp Asp He His Leu Thr Gly Glu He Phe Ser Phe 
340 345 350 

Phe Arg Ser Phe Gly His Pro Arg Leu Glu Ala Val Thr Ala Ala Glu 
355 360 365 
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Asn Val Arg Lys Tyr Met Asn Gin Pro Lys Val lie Val Tyr Glu Thr 
370 375 380 

Leu Met Lys Gly His Ala lie Phe Cys Gly lie lie lie Asn Gly Tyr 
385 390 395 400 

Arg Asp Arg His Gly Gly Ser Trp Pro Pro Leu Thr Leu Pro Leu His 
405 410 415 

Ala Ala Asp Thr lie Arg Asn Ala Gin Ala Ser Gly Glu Gly Leu Thr 
420 425 430 

His Glu Gin Cys Val Asp Asn Trp Lys Ser Phe Ala Gly Val Lys Phe 
435 440 445 

Gly Cys Phe Met Pro Leu Ser Leu Asp Ser Asp Leu Thr Met Tyr Leu 
450 455 460 

Lys Asp Lys Ala Leu Ala Ala Leu Gin Arg Glu Trp Asp Ser Val Tyr 
465 470 475 480 

Pro Lys Glu Phe Leu Arg Tyr Asp Pro Pro Lys Gly Thr Gly Ser Arg 
485 490 495 

Arg Leu Val Asp Val Phe Leu Asn Asp Ser Ser Phe Asp Pro Tyr Asp 
500 505 510 

Val He Met Tyr Val Val Ser Gly Ala Tyr Leu His Asp Pro Glu Phe 
515 520 525 

Asn Leu Ser Tyr Ser Leu Lys Glu Lys Glu He Lys Glu Thr Gly Arg 
530 535 540 

Leu Phe Ala Lys Met Thr Tyr Lys Met Arg Ala Cys Gin Val He Ala 
545 550 555 560 

Glu Asn Leu He Ser Asn Gly He Gly Lys Tyr Phe Lys Asp Asn Gly 
565 570 575 

Met Ala Lys Asp Glu His Asp Leu Thr Lys Ala Leu His Thr Leu Ala 
580 585 590 

Val Ser Gly Val Pro Lys Asp Leu Lys Glu Ser His Arg Gly Gly Pro 
595 600 605 

Val Leu Lys Thr Tyr Ser Arg Ser Pro Val His Thr Ser Thr Arg Asn 
610 615 620 

Val Arg Ala Ala Lys Gly Phe He Gly Phe Pro Gin Val He Arg Gin 
625 630 635 640 
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Asp Gin Asp Thr Asp His Pro Glu Asn Met Glu Ala Tyr Glu Thr Val 
645 650 655 

Ser Ala Phe lie Thr Thr Asp Leu Lys Lys Tyr Cys Leu Asn Trp Arg 
660 665 670 

Tyr Glu Thr lie Ser Leu Phe Ala Gin Arg Leu Asn Glu He Tyr Gly 
675 680 685 

Leu Pro Ser Phe Phe Gin Trp Leu His Lys Arg Leu Glu Thr Ser Val 
690 695 700 

Leu Tyr Val Ser Asp Pro His Cys Pro Pro Asp Leu Asp Ala His He 
705 710 715 720 

Pro Leu Tyr Lys Val Pro Asn Asp Gin He Phe He Lys Tyr Pro Met 
725 730 735 

Gly Gly He Glu Gly Tyr Cys Gin Lys Leu Trp Thr He Ser Thr He 
740 745 750 

Pro Tyr Leu Tyr Leu Ala Ala Tyr Glu Ser Gly Val Arg He Ala Ser 
755 760 765 

Leu Val Gin Gly Asp Asn Gin Thr He Ala Val Thr Lys Arg Val Pro 
770 775 780 

Ser Thr Trp Pro Tyr Asn Leu Lys Lys Arg Glu Ala Ala Arg Val Thr 
785 790 795 800 

Arg Asp Tyr Phe Val He Leu Arg Gin Arg Leu His Asp He Gly His 
805 810 815 

His Leu Lys Ala Asn Glu Thr He Val Ser Ser His Phe Phe Val Tyr 
820 825 830 

Ser Lys Gly He Tyr Tyr Asp Gly Leu Leu Val Ser Gin Ser Leu Lys 
835 840 845 

Ser He Ala Arg Cys Val Phe Trp Ser Glu Thr He Val Asp Glu Thr 
850 855 860 

Arg Ala Ala Cys Ser Asn He Ala Thr Thr Met Ala Lys Ser He Glu 
865 870 875 880 

Arg Gly Tyr Asp Arg Tyr Leu Ala Tyr Ser Leu Asn Val Leu Lys Val 
885 890 895 

He Gin Gin He Leu He Ser Leu Gly Phe Thr He Asn Ser Thr Met 
900 905 910 

Thr Arg Asp Val Val He Pro Leu Leu Thr Asn Asn Asp Leu Leu He 



SUBSTITUTE SHEET (RULE 26) 



WO 98/13501 PCT/US97/16718 



213 



915 920 

Arg Met Ala Leu Leu Pro Ala Pro 
930 935 

Met Ser Arg Leu Phe Val Arg Asn 
945 950 



925 

lie Gly Gly Met Asn Tyr Leu Asn 
940 

lie Gly Asp Pro Val Thr Ser Ser 
955 960 



lie Ala Asp Leu Lys Arg Met lie Leu Ala Ser Leu Met Pro Glu Glu 
965 970 975 

Thr Leu His Gin Val Met Thr Gin Gin Pro Gly Asp Ser Ser Phe Leu 
980 985 990 

Asp Trp Ala Ser Asp Pro Tyr Ser Ala Asn Leu Val Cys Val Gin Ser 
995 1000 1005 

lie Thr Arg Leu Leu Lys Asn lie Thr Ala Arg Phe Val Leu lie His 
1010 1015 1020 

Ser Pro Asn Pro Met Leu Lys Gly Leu Phe His Asp Asp Ser Lys Glu 
1025 1030 1035 1040 

Glu Asp Glu Gly Leu Ala Ala Phe Leu Met Asp Arg His He He Val 
1045 1050 1055 

Pro Arg Ala Ala His Glu He Leu Asp His Ser Val Thr Gly Ala Arg 
1060 1065 1070 

Glu Ser He Ala Gly Met Leu Asp Thr Thr Lys Gly Leu He Arg Ala 
1075 1080 1085 

Ser Met Arg Lys Gly Gly Leu Thr Ser Arg Val He Thr Arg Leu Ser 
1090 1095 1100 

Asn Tyr Asp Tyr Glu Gin Phe Arg Ala Gly Met Val Leu Leu Thr Gly 
1105 1110 1115 1120 

Arg Lys Arg Asn Val Leu He Asp Lys Glu Ser Cys Ser Val Gin Leu 
1125 1130 1135 

Ala Arg Ala Leu Arg Ser His Met Trp Ala Arg Leu Ala Arg Gly Arg 
1140 1145 1150 

Pro He Tyr Gly Leu Glu Val Pro Asp Val Leu Glu Ser Met Arg Gly 
1155 1160 1165 

His Leu He Arg Arg His Glu Thr Cys Val He Cys Glu Cys Gly Ser 
1170 1175 1180 

Val Asn Tyr Gly Trp Phe Phe Val Pro Ser Gly Cys Gin Leu Asp Asp 
1185 1190 1195 1200 
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He Asp Lys Glu Thr Ser Ser Leu Arg Val Pro Tyr He Gly Ser Thr 
1205 1210 1215 

Thr Asp Glu Arg Thr Asp Met Lys Leu Ala Phe Val Arg Ala Pro Ser 
1220 1225 1230 

Arg Ser Leu Arg Ser Ala Val Arg He Ala Thr Val Tyr Ser Trp Ala 
1235 1240 1245 

Tyr Gly Asp Asp Asp Ser Ser Trp Asn Glu Ala Trp Leu Leu Ala Arg 
1250 1255 1260 

Gin Arg Ala Asn Val Ser Leu Glu Glu Leu Arg Val He Thr Pro He 
1265 1270 1275 1280 

Ser Thr Ser Thr Asn Leu Ala His Arg Leu Arg Asp Arg Ser Thr Gin 
1285 1290 1295 

Val Lys Tyr Ser Gly Thr Ser Leu Val Arg Val Ala Arg Tyr Thr Thr 
1300 1305 1310 

He Ser Asn Asp Asn Leu Ser Phe Val He Ser Asp Lys Lys Val Asp 
1315 1320 1325 

Thr Asn Phe He Tyr Gin Gin Gly Met Leu Leu Gly Leu Gly Val Leu 
1330 1335 1340 

Glu Thr Leu Phe Arg Leu Glu Lys Asp Thr Gly Ser Ser Asn Thr Val 
1345 1350 1355 1360 

Leu His Leu His Val Glu Thr Asp Cys Cys Val He Pro Met He Asp 
1365 1370 1375 

His Pro Arg He Pro Ser Ser Arg Lys Leu Glu Leu Arg Ala Glu Leu 
1380 1385 1390 

Cys Thr Asn Pro Leu He Tyr Asp Asn Ala Pro Leu He Asp Arg Asp 
1395 1400 1405 

Thr Thr Arg Leu Tyr Thr Gin Ser His Arg Arg His Leu Val Glu Phe 
1410 1415 1420 

Val Thr Trp Ser Thr Pro Gin Leu Tyr His He Leu Ala Lys Ser Thr 
1425 1430 1435 1440 

Ala Leu Ser Met He Asp Leu Val Thr Lys Phe Glu Lys Asp His Met 
1445 1450 1455 

Asn Glu He Ser Ala Leu He Gly Asp Asp Asp He Asn Ser Phe He 
1460 1465 1470 
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Thr Glu Phe Leu Leu lie Glu Pro Arg Leu Phe Thr He Tyr Leu Gly 
1475 1480 1485 

Gin Cys Ala Ala He Asn Trp Ala Phe Asp Val Hie Tyr His Arg Pro 
1490 1495 1500 

Ser Gly Lys Tyr Gin Met Gly Glu Leu Leu Ser Ser Phe Leu Ser Arg 
1505 1510 1515 1520 

Met Ser Lys Gly Val Phe Lys Val Leu Val Asn Ala Leu Ser His Pro 
1525 1530 1535 

Lys He Tyr Lys Lys Phe Trp His Cys Gly He He Glu Pro He His 
1540 1545 1550 

Gly Pro Ser Leu Asp Ala Gin Asn Leu His Thr Thr Val Cys Asn Met 
1555 1560 1565 

Val Tyr Thr Cys Tyr Met Thr Tyr Leu Asp Leu Leu Leu Asn Glu Glu 
1570 1575 1580 

Leu Glu Glu Phe Thr Phe Leu Leu Cys Glu Ser Asp Glu Asp Val Val 
1585 1590 1595 1600 

Pro Asp Arg Phe Asp Asn He Gin Ala Lys His Leu Cys Val Leu Ala 
1605 1610 1615 

Asp Leu Tyr Cys Gin Pro Gly Thr Cys Pro Pro He Arg Gly Leu Arg 
1620 1625 1630 

Pro Val Glu Lys Cys Ala Val Leu Thr Asp His He Lys Ala Glu Ala 
1635 1640 1645 

Arg Leu Ser Pro Ala Gly Ser Ser Trp Asn He Asn Pro He He Val 
1650 1655 1660 

Asp His Tyr Ser Cys Ser Leu Thr Tyr Leu Arg Arg Gly Ser He Lys 
1665 1670 1675 1680 

Gin He Arg Leu Arg Val Asp Pro Gly Phe He Phe Asp Ala Leu Ala 
1685 1690 1695 

Glu Val Asn Val Ser Gin Pro Lys He Gly Ser Asn Asn He Ser Asn 
1700 1705 1710 

Met Ser He Lys Ala Phe Arg Pro Pro His Asp Asp Val Ala Lys Leu 
1715 1720 1725 

Leu Lys Asp He Asn Thr Ser Lys His Asn Leu Pro He Ser Gly Gly 
1730 1735 1740 

Asn Leu Ala Asn Tyr Glu He His Ala Phe Arg Arg He Gly Leu Asn 
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1745 1750 1755 1760 

Ser Ser Ala Cys Tyr Lye Ala Val Glu He Ser Thr Leu lie Arg Arg 
1765 1770 1775 

Cys Leu Glu Pro Gly Glu Asp Gly Leu Phe Leu Gly Glu Gly Ser Gly 
1780 1785 1790 

Ser Met Leu He Thr Tyr Lys Glu He Leu Lys Leu Asn Lys Cys Phe 
1795 1800 1805 

Tyr Asn Ser Gly Val Ser Ala Asn Ser Arg Ser Gly Gin Arg Glu Leu 
1810 1815 1820 



Ala Pro Tyr Pro Ser Glu Val Gly 
1825 1830 

Gly Asn He Val Lys Val Leu Phe 
1845 

Val Gly Ser Val Asp Cys Phe Asn 
1860 



Leu Val Glu His Arg Met Gly Val 
1835 1840 

Asn Gly Arg Pro Glu Val Thr Trp 
1850 1855 

Phe He Val Ser Asn He Pro Thr 
1865 1870 



Ser Ser Val Gly Phe He His Ser Asp He Glu Thr Leu Pro Asn Lys 
1875 1880 1885 

Asp Thr He Glu Lys Leu Glu Glu Leu Ala Ala He Leu Ser Met Ala 
1890 1895 1900 

Leu Leu Leu Gly Lys He Gly Ser He Leu Val He Lys Leu Met Pro 
1905 1910 1915 1920 

Phe Ser Gly Asp Phe Val Gin Gly Phe He Ser Tyr Val Gly Ser His 
1925 1930 1935 

Tyr Arg Glu Val Asn Leu Val Tyr Pro Arg Tyr Ser Asn Phe lie Ser 
1940 1945 1950 

Thr Glu Ser Tyr Leu Val Met Thr Asp Leu Lys Ala Asn Arg Leu Met 
1955 1960 1965 

Asn Pro Glu Lys He Lys Gin Gin He He Glu Ser Ser Val Arg Thr 
1970 1975 1980 

Ser Pro Gly Leu He Gly His He Leu Ser He Lys Gin Leu Ser Cys 
1985 1990 1995 2000 

He Gin Ala He Val Gly Asp Ala Val Ser Arg Gly Asp He Asn Pro 
2005 2010 2015 

Thr Leu Lys Lys Leu Thr Pro He Glu Gin Val Leu He Asn Cys Gly 
2020 2025 2030 
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Leu Ala lie Asn Gly Pro Lys Leu Cys Lys Glu Leu lie His His Asp 
2035 2040 2045 

Val Ala Ser Gly Gin Asp Gly Leu Leu Asn Ser lie Leu lie Leu Tyr 
2050 2055 2060 

Arg Glu Leu Ala Arg Phe Lys Asp Asn Gin Arg Ser Gin Gin Gly Met 
2065 2070 2075 2080 

Phe His Ala Tyr Pro Val Leu Val Ser Ser Arg Gin Arg Glu Leu lie 
2085 2090 2095 

Ser Arg lie Thr Arg Lys Phe Trp Gly His lie Leu Leu Tyr Ser Gly 
2100 2105 2110 

Asn Arg Lys Leu He Asn Lys Phe lie Gin Asn Leu. Lys Ser Gly Tyr 
2115 2120 2125 

Leu He Leu Asp Leu His Gin Asn He Phe Val Lys Asn Leu Ser Lys 
2130 2135 2140 

Ser Glu Lys Gin He He Met Thr Gly Gly Leu Lys Arg Glu Trp Val 
2145 2150 2155 " 2160 

Phe Lys Val Thr Val Lys Glu Thr Lys Glu Trp Tyr Lys Leu Val Gly 
2165 2170 2175 

Tyr Ser Ala Leu He Lys Asp 
2180 

(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15894 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 

ACCAAACAAA GTTGGGTAAG GATAGTTCAA TCAATGATCA TCTTCTAGTG CACTTAGGAT 60 

TCAAGATCCT ATTATCAGGG ACAAGAGCAG GATTAGGGAT ATCCGAGATG GCCACACTTT 120 

TAAGGAGCTT AGCATTGTTC AAAAGAAACA AGGACAAACC ACCCATTACA TCAGGATCCG 180 



SUBSTITUTE SHEET (RULE 26) 



WO 98/13501 



PCT/US97/16718 



- 218 - 



GTGGAGCCAT CAGAGGAATC AAACACATTA TTATAGTACC AATCCCTGGA GATTCCTCAA 
TTACCACTCG ATCCAGACTT CTGGACCGGT TGGTCAGGTT AATTGGAAAC CCGGATGTGA 
GCGGGCCCAA ACTAACAGGG GCACTAATAG GTATATTATC CTTATTTGTG GAGTCTCCAG 
GTCAATTGAT TCAGAGGATC ACCGATGACC CTGACGTTAG CATAAGGCTG TTAGAGGTTG 
TCCAGAGTGA CCAGTCACAA TCTGGCCTTA CCTTCGCATC AAGAGGTACC AACATGGAGG 
ATGAGGCGGA CAAATACTTT TCACATGATG AT CCAATT AG TAGTGATCAA TCCAGGTTCG 
GATGGTTCGA GAACAAGGAA ATCTCAGATA TTGAAGTGCA AGACCCTGAG GGATTCAACA 
TGATTCTGGG TACCATCCTA GCCCAAATTT GGGTCTTGCT CGCAAAGGCG GTTACGGCCC 
CAGACACGGC AGCTGATTCG GAGCTAAGAA GGTGGATAAA GTACACCCAA CAAAGAAGGG 
TAGTTGGTGA ATTTAGATTG GAGAGAAAAT GGTTGGATGT GGTGAGGAAC AGGATTGCCG 
AGGACCTCTC CTTACGCCGA TTCATGGTCG CTCTAATCCT GGATATCAAG AGAACACCCG 
GAAACAAACC CAGGATTGCT GAAATGATAT GTGACATTGA TACATATATC GTAGAGGCAG 
GATTAGCCAG TTTTATCCTG ACTATTAAGT TTGGGATAGA AACTATGTAT CCTGCTCTTG 
GACTGCATGA ATTTGCTGGT GAGTTATCCA CACTTGAGTC CTTGATGAAC CTTTACCAGC 
AAATGGGGGA AACTGCACCC TACATGGTAA TCCTGGAGAA CTCAATTCAG AACAAGTTCA 
GTGCAGGATC ATACCCTCTG CTCTGGAGCT ATGCCATGGG AGTAGGAGTG GAACTTGAAA 
ACTCCATGGG AGGTTTGAAC TTTGGCCGAT CTTACTTTGA TCCAGCATAT TTTAGATTAG 
GGCAAGAGAT GGTAAGGAGG TCAGCTGGAA AGGTCAGTTC CACATTGGCA TCTGAACTCG 
GTATCACTGC CGAGGATGCA AGGCTTGTTT CAGAGATTGC AATGCATACT ACTGAGGACA 
AGATCAGTAG AGCGGTTGGA CCCAGACAAG CCCAAGTATC ATTTCTACAC GGTGATCAAA 
GTGAGAATGA GCTACCGAGA TTGGGGGGCA AGGAAGATAG GAGGGTCAAA CAGAGTCGAG 
GAGAAGCCAG GGAGAGCTAC AGAGAAACCG GGCCCAGCAG AGCAAGTGAT GCGAGAGCTG 
CCCATCTTCC AACCGGCACA CCCCTAGACA TTGACACTGC ATCGGAGTCC AGCCAAGATC 
CGCAGGACAG TCGAAGGTCA GCTGACGCCC TGCTTAGGCT GCAAGCCATG GCAGGAATCT 
CGGAAGAACA AGGCTCAGAC ACGGACACCC CTATAGTGTA CAATGACAGA AATCTTCTAG 
ACTAGGTGCG AGAGGCCGAG GACCAGAACA ACATCCGCCT ACCCTCCATC ATTGTTATAA 
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AAAACTTAGG AACCA6GTCC ACACAGCCGC CAGCCCATCA ACCATCCACT CCCACGATTG 
GAGCCGATGG CAGAAGAGCA GGCACGCCAT GTCAAAAACG GACTGGAATG CATCCGGGCT 
CTCAAGGCCG AGCCCATCGG CTCACTGGCC ATCGAGGAAG CTATGGCAGC ATGGTCAGAA 
ATATCAGACA ACCCAGGACA GGAGCGAGCC ACCTGCAGGG AAGAGAAGGC AGGCAGTTCG 
GGTCTCAGCA AACCATGCCT CTCAG CAATT GGATCAACTG AAGGCGGTGC ACCTCGCATC 
CGCGGTCAGG GACCTGGAGA GAGCGATGAC GACGCTGAAA CTTTGGGAAT CCCCCCAAGA 
AATCTCCAGG CAT CAAGCAC TGGGTTACAG TGTTATTATG TTTATGATCA CAGCGGTGAA 
GCGGTTAAGG GAATCCAAGA TGCTGACTCT ATCATGGTTC AATCAGGCCT TGATGGTGAT 
AGCACCCTAT CAGGAGGAGA CAATGAATCT GAAAACAGCG ATGTGGATAT TGGCGAACCT 
GATACCGAGG GATATGCTAT CACTGACCGG GGATCTGCTC CCATCTCTAT GGGGTTCAGG 
GCTTCTGATG TTGAAACTGC AGAAGGAGGG GAGATCCACG AGCTCCTGAG ACTCCAATCC 
AGAGGCAACA ACTTTCCGAA GCTTGGGAAA ACTCTCAATG TTCCTCCGCC CCCGGACCCC 
GGTAGGGCCA GCACTTCCGG GACACCCATT AAAAAGGGCA CAGACGCGAG ATTAGCCTCA 
TTTGGAACGG AGATCGCGTC TTTATTGACA GGTGGTGCAA CCCAATGTGC TCGAAAGTCA 
CCCTCGGAAC CATCAGGGCC AGGTGCACCT GCGGGGAATG TCCCCGAGTA TGTGAGCAAT 
GCCGCACTGA TACAGGAGTG GACACCCGAA TCTGGTACCA CAATCTCCCC GAGATCCCAG 
AATAATGAAG AAGGGGGAGA CTATTATGAT GATGAGCTGT TCTCTGATGT CCAAGATATT 
AAAACAGCCT TGGCCAAAAT ACACGAGGAT AATCAGAAGA TAATCTCCAA GCTAGAATCA 
CTGCTGTTAT TGAAGGGAGA AGTTGAGTCA ATTAAGAAGC AGATCAACAG GCAAAATATC 
AGCATATCCA CCCTGGAAGG ACACCTCTCA AGCATCATGA TCGCCATTCC TGGACTTGGG 
AAGGATCCCA ACGACCCCAC TGCAGATGTC GAAATCAATC CCGACTTGAA ACCCATCATA 
GGCAGAGATT CAGGCCGAGC ACTGGCCGAA GTTCTCAAGA AACCCGTTGC CAGC CGACAA 
CTCCAAGGAA TGACAAATGG ACGGACCAGT TCCAGAGGAC AGCTGCTGAA GGAATTTCAG 
CCAAAGCCGA TCGGGAAAAA GATGAGCTCA GCCGTCGGGT TTGTTCCTGA CACCGGCCCT 
GCATCACGCA GTGTAATCCG CTCCATTATA AAATCCAGCC GGCTAGAGGA GGATCGGAAG 
CGTTACCTGA TGACTCTCCT TGATGATATC AAAGGAGCCA ATGATCTTGC CAAGTTCCAC 
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CAGATGCTGA TGAAGATAAT AATGAAGTAG CTACAGCTCA ACTTACCTGC CAACCCCATG 
CCAGTCGACC CAACTAGTAC AACCTAAATC CATTATAAAA AACTTAGGAG CAAAGTGATT 
GCCTCCCAAG TTCCACAATG ACAGAGATCT ACGACTTCGA CAAGTCGGCA TGGGACATCA 
AAGGGTCGAT CGCTCCGATA CAACCGACCA CCTACAGTGA TGGCAGGCTG GTGCCCCAGG 
TCAGAGTCAT AGATCCTGGT CTAGGCGACA GGAAGGATGA ATGCTTTATG TACATGTCTC 
TGCTGGGGGT TGTTGAGGAC AGCGATCCCC TAGGGCCTCC AATCGGGCGA GCATTTGGGT 
CCCTGCCCTT AGGTGTTGGC AGATCCACAG CAAAGCCCGA AAAACTCCTC AAAGAGGCCA 
CTGAGCTTGA CATAGTTGTT AGACGTACAG CAGGGCTCAA TGAAAAACTG GTGTTCTACA 
ACAACACCCC ACTAACTCTC CTCACACCTT GGAGAAAGGT CCTAACAACA GGGAGTGTCT 
TCAACGCAAA CCAAGTGTGC AATGCGGTTA ATCTGATACC GCTCGATACC CCGCAGAGGT 
TCCGTGTTGT TTATATGAGC ATCACCCGTC TTTCGGATAA CGGGTATTAC ACCGTTCCTA 
GAAGAATGCT GGAATTCAGA TCGGTCAATG CAGTGGCCTT CAACCTGCTG GTGACCCTTA 
GGATTGACAA GGCGATAGGC CCTGGGAAGA TCATCGACAA TACAGAGCAA CTTCCTGAGG 
CAACATTTAT GGTCCACATC GGGAACTTCA GGAGAAAGAA GAGTGAAGTC TACTCTGCCG 
ATTATTGCAA AATGAAAATC GAAAAGATGG GCCTGGTTTT TGCACTTGGT GGGATAGGGG 
GCACCAGTCT TCACATTAGA AGCACAGGCA AAATGAGCAA GACTCTCCAT GCACAACTCG 
GGTTCAAGAA GACCTTATGT TACCCGCTGA TGGATATCAA TGAAGACCTT AATCGATTAC 
TCTGGAGGAG CAGATGCAAG ATAGTAAGAA TCCAGGCAGT TTTGCAGCCA TCAGTTCCTC 
AAGAATTCCG CATTTACGAC GACGTGATCA TAAATGATGA CCAAGGACTA TTCAAAGTTC 
TGTAGACCGT AGTGCCCAGC AATGCCCGAA AACGACCCCC CTCACAATGA CAGCCAGAAG 
GCCCGGACAA AAAAGCCCCC TCCGAAAGAC TCCACTGACC AAGCGAGAGG CCAGCCAGCA 
GCCGACGGCA AGCACGAACA CCAGGCGGCC CCAGCACAGA ACAGCCCTGA TACAAGGCCA 
CCACCAGCCA CCCCAATCTG CATCCTCCTC GTGGGACCCC CGAGGACCAA CCCCCAAGGC 
TGCCCCCGAT CCAAACCACC AACCGCATCC CCACCACCCC CGGGAAAGAA ACCCCCAGCA 
ATTGGAAGGC CCCTCCCCCT CTTCCTCAAC ACAAGAACTC CACAACCGAA CCGCACAAGC 
GACCGAGGTG ACCCAACCGC AGGCATCCGA CTCCCTAGAC AGATCCTCTC TCCCCGGCAA 
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ACTAAACAAA ACTTAGGGCC AAGGAACATA CACACCCAAC AGAACCCAGA CCCCGGCCCA 
CGGCGCCGCG CCCCCAACCC CCGACAACCA GAGGGAGCCC CCAACCAATC CCGCCGGCTC 
CCCCGGTGCC CACAGGCAGG GACACCAACC CCCGAACAGA CCCAGCACCT AACCATCGAC 
AATCCAAGAC GGGGGGGCCC CCCCAAAAAA AGGCCCCCAG GGGCCGACAG CCAGCACCGC 
GAGGAAGCCC ACCCACCCCA CACACGACCA CGGCAACCAA ACCAGAACCC AGACCACCCT 
GGGCCACCAG CTCCCAGACT CGGCCATCAC CCCGCAGAAA GGAAAGGCCA CAACCCGCGC 
ACCCCAGCCC CGATCCGGCG GGGAGCCACC CAACCCGAAC CAGCACCCAA GAGCGATCCC 
CGAAGGACCC CCGAACCGCA AAGGACATCA GTATCCCACA GCCTCTCCAA GTCCCCCGGT 
CTCCTCCTCT TCTCGAAGGG ACCAAAAGAT CAATCCACCA CACCCGACGA CACTCAACTC 
CCCACCCCTA AAGGAGACAC CGGGAATCCC AGAATCAAGA CTCATCCAAT GTCCATCATG 
GGTCTCAAGG TGAACGTCTC TGCCATATTC ATGGCAGTAC TGTTAACTCT CCAAACACCC 
ACCGGTCAAA TCCATTGGGG CAATCTCTCT AAGATAGGGG TGGTAGGAAT AGGAAGTGCA 
AGCTACAAAG TTATGACTCG TTCCAGCCAT CAATCATTAG TCATAAAATT AATGCCCAAT 
ATAACTCTCC TCAATAACTG CACGAGGGTA GAGATTGCAG AATACAGGAG ACTACTGAGA 
ACAGTTTTGG AACCAATTAG AGATGCACTT AATGCAATGA CCCAGAATAT AAGACCGGTT 
CAGAGTGTAG CTTCAAGTAG GAGACACAAG AGATTTGCGG GAGTAGTCCT GGCAGGTGCG 
GCCCTAGGCG TTGCCACAGC TGCTCAGATA ACAGCCGGCA TTGCACTTCA CCAGTCCATG 
CTGAACTCTC AAGCCATCGA CAATCTGAGA GCGAGCCTGG AAACTACTAA TCAGGCAATT 
GAGGCAATCA GACAAGCAGG GCAGGAGATG ATATTGGCTG TTCAGGGTGT CCAAGACTAC 
ATCAATAATG AGCTGATACC GTCTATGAAC CAACTATCTT GTGATTTAAT CGGCCAGAAG 
CTCGGGCTCA AATTGCTCAG ATACTATACA GAAATCCTGT CATTATTTGG CCCCAGCTTA 
CGGGACCCCA TATCTGCGGA GATATCTATC CAGGCTTTGA GCTATGCGCT TGGAGGAGAC 
ATCAATAAGG TGTTAGAAAA GCTCGGATAC AGTGGAGGTG ATTTACTGGG CATCTTAGAG 
AGCAGAGGAA TAAAGGCCCG GATAACTCAC GTCGACACAG AGTCCTACTT CATTGTCCTC 
AGTATAGCCT ATCCGACGCT GTCCGAGATT AAGGGGGTGA TTGTCCACCG GCTAGAGGGG 
GTCTCGTACA ACATAGGCTC TCAAGAGTGG TATACCACTG TGCCCAAGTA TGTTGCAACC 
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CAAGGGTACC TTATCTCGAA TTTTGATGAG TCATCGTGTA CTTTCATGCC AGAGGGGACT 
GTGTGCAGCC AAAATGCCTT GTACCCGATG AGTCCTCTGC TCCAAGAATG CCTCCGGGGG 
TACACCAAGT CCTGTGCTCG TACACTCGTA TCCGGGTCTT TTGGGAACCG GTTCATTTTA 
TCACAAGGGA ACCTAATAGC CAATTGTGCA TCAATCCTTT GCAAGTGTTA CACAACAGGA 
ACGATCATTA ATCAAGACCC TGACAAGATC CTAACATACA TTGCTGCCGA TAACTGCCCG 
GTAGTCGAGG TGAACGGCGT GACCATCCAA GTCGGGAGCA GGAGGTATCC AGACGCTGTG 
TACTTGCACA GAATTGACCT CGGTCCTCCC ATATTATTGG AGAGGTTGGA CGTAGGGACA 
AATCTGGGGA ATGCAATTGC TAAGTTGGAG GATGCCAAGG AATTGTTGGA GTCATCGGAC 
CAGATATTGA GGAGTATGAA AGGTTTATCG AGCACTTGCA TAGTCTACAT CCTGATTGCA 
GTGTGTCTTG GAGGGTTGAT AGGGATCCCC GCTTTAATAT GTTGCTGCAG GGGGCGTTGT 
AACAAAAAGG GAGAACAAGT TGGTATGTCA AGACCAGGCC TAAAGCCTGA TCTTACGGGA 
ACATCAAAAT CCTATGTAAG GTCGCTCTGA TCCTCTACAA CTCTTGAAAC ACAAATGTCC 
CACAAGTCTC CTCTTCGTCA TCAAGCAACC ACCGCACCCA GCATCAAGCC CACCTGAAAT 
TATCTCCGGC TTCCCTCTGG CCGAACAATA TCGGTAGTTA ATTAAAACTT AGGGTGCAAG 
ATCATCCACA ATGTCACCAC AACGAGACCG GATAAATGCC TTCTACAAAG ATAACCCCCA 
TCCCAAGGGA AGTAGGATAG TCATTAACAG AGAACATCTT ATGATTGATA GACCTTATGT 
TTTGCTGGCT GTTCTGTTTG TCATGTTTCT GAGCTTGATC GGGTTGCTAG CCATTGCAGG 
CATTAGACTT CATCGGGCAG CCATCTACAC CGCAGAGATC CATAAAAGCC TCAGCACCAA 
TCTAGATGTA ACTAACTCAA TCGAGCATCA GGTCAAGGAC GTGCTGACAC CACTCTTCAA 
AATCATCGGT GATGAAGTGG GCCTGAGGAC ACCTCAGAGA TTCACTGACC TAGTGAAATT 
CATCTCTGAC AAGATTAAAT TCCTTAATCC GGATAGGGAG TACGACTTCA GAGATCTCAC 
TTGGTGTATC AACCCGCCAG AGAGAATCAA ATTGGATTAT GATCAATACT GTGCAGATGT 
GGCTGCTGAA GAGCTCATGA ATGCATTGGT GAACTCAACT CTACTGGAGA CCAGAACAAC 
CAATCAGTTC CTAGCTGTCT CAAAGGGAAA CTGCTCAGGG CCCACTACAA TCAGAGGTCA 
ATTCTCAAAC ATGTCGCTGT CCCTGTTAGA CTTGTATTTA GGTCGAGGTT ACAATGTGTC 
ATCTATAGTC ACTATGACAT CCCAGGGAAT GTATGGGGGA ACTTACCTAG TGGAAAAGCC 
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TAATCTGAGC AGCAAAAGGT CAGAGTTGTC ACAACTGAGC ATGTACCGAG TGTTTGAAGT 
AGGTGTTATC AGAAATCCGG GTTTGGGGGC TCCGGTGTTC CATATGACAA ACTATCTTGA 
GCAACCAGTC AGTAATGATC TCAGCAACTG TATGGTGGCT TTGGGGGAGC TCAAACTCGC 
AGCCCTTTGT CACCGGGAAG ATTCTATCAC AATTCCCTAT CAGGGATCAG GGAAAGGTGT 
CAGCTTCCAG CTCGTCAAGC TAGGTGTCTG GAAATCCCCA ACCGACATGC AATCCTGGGT 
CACCTTATCA ACGGATGATC CAGTGATAGA CAGGCTTTAC CTCTCATCTC ACAGAGGTGT 
TATCGCTGAC AATCAAGCAA AATGGGCTGT CCCGACAACA CGAACAGATG ACAAGTTGCG 
AATGGAGACA TGCTTCCAAC AGGCGTGTAA GGGTAAAATC CAAGCACTCT GCGAGAATCC 
CGAGTGGGCA CCATTGAAGG ATAACAGGAT TCCTTCATAC GGGGTCTTGT CTGTTGATCT 
GAGTCTGACA GTTGAGCTTA AAATCAAAAT TGCTTCGGGA TTCGGGCCAT TGATCACACA 
CGGTTCAGGG ATGGACCTAT ACAAATCCAA CCACAACAAT GTGTATTGGC TGACTATCCC 
ACCAATGAAG AACCTAGCCT TAGGTGTAAT CAACACATTG GAGTGGATAC CGAGATTCAA 
GGTTAGTCCC TACCTCTTCA ATGTCCCAAT TAAGGAAGCA GGCGAAGACT GCCATGCCCC 
AACATACCTA CCTGCGGAGG TGGATGGTGA TGTCAAACTC AGTTCCAATC TGGTGATTCT 
ACCTGGTCAA GATCTCCAAT ATGTTTTGGC AACCTACGAT ACTTCCAGGG TTGAACATGC 
TGTGGTTTAT TACGTTTACA GCCCAAGCCG CTCATTTTCT TACTTTTATC CTTTTAGGTT 
GCCTATAAAG GGGGTCCCCA TCGAATTACA AGTGGAATGC TTCACATGGG ACCAAAAACT 
CTGGTGCCGT CACTTCTGTG TGCTTGCGGA CTCAGAATCT GGTGGACATA TCACTCACTC 
TGGGATGGTG GGCATGGGAG TCAGCTGCAC AGTCACCCGG GAAGATGGAA CCAATCGCAG 
ATAGGGCTGC TAGTGAACTA ATCTCATGAT GTCACCCAGA CATCAGGCAT ACCCACTAGT 
GTGAAATAGA CATCAGAATT AAGAAAAACG TAGGGTCCAA GTGGTTCCCC GTTATGGACT 
CGCTATCTGT CAACCAGATC TTATACCCTG AAGTTCACCT AGATAGCCCG ATAGTTACCA 
ATAAGATAGT AGCCATCCTG GAGTATGCTC GAGTCCCTCA CGCTTACAGC CTGGAGGACC 
CTACACTGTG TCAGAACATC AAGCACCGCC TAAAAAACGG ATTTTCCAAC CAAATGATTA 
TAAACAATGT GGAAGTTGGG AATGTCATCA AGTCCAAGCT TAGGAGTTAT CCGGCCCACT 
CTCATATTCC ATATCCAAAT TGTAATCAGG ATTTATTTAA CATAGAAGAC AAAGAGTCAA 
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CGAGGAAGAT CCGTGAACTC CTCAAAAAGG GGAATTCGCT GTACTCCAAA GTCAGTGATA 9600 
AGGTTTTCCA ATGCTTAAGG GACACTAACT CACGGCTTGG CCTAGGCTCC GAATTGAGGG 9660 
AGGACATCAA GGAGAAAGTT ATTAACTTGG GAGTTTACAT GCACAGCTCC CAGTGGTTTG 9720 



CCCATACTTG C CAT AGGAGG AGACACACAC CTGTATTCTT CACTGGTAGT TCAGTTGAGT 9840 

TGCTAATCTC TCGTGACCTT GTTGCTATAA TCAGTAAAGA GTCTCAACAT GTATATTACC 9900 

TGACATTTGA ACTGGTTTTG ATGTATTGTG ATGTCATAGA GGGGAGGTTA ATGACAGAGA 9960 

CCGCTATGAC TATTGATGCT AGGTATACAG AGCTTCTAGG AAGAGTCAGA TACATGTGGA 10020 

AACTGATAGA TGGTTTCTTC CCTGCACTCG GGAATCCAAC TTATCAAATT GTAGCCATGC 10080 

TGGAGCCTCT TTCACTTGCT TACCTGCAGC TGAGGGATAT AACAGTAGAA CTCAGAGGTG 10140 

CTTTCCTTAA CCACTGCTTT ACTGAAATAC ATGATGTTCT TGACCAAAAC GGGTTTTCTG 10200 

ATGAAGGTAC TTATCATGAG TTAATTGAAG CTCTAGATTA CATTTTCATA ACTGATGACA 10260 

TACATCTGAC AGGGGAGATT TTCTCATTTT TCAGAAGTTT CGGCCACCCC AGACTTGAAG 10320 

CAGTAACGGC TGCTGAAAAT GTTAGGAAAT ACATGAATCA GCCTAAAGTC ATTGTGTATG 103B0 

AGACTCTGAT GAAAGGTCAT GCCATATTTT GTGGAATCAT AATCAACGGC TATCGTGACA 10440 

GGCACGGAGG CAGTTGGCCA CCGCTGACCC TCCCCCTGCA TGCTGCAGAC ACAATCCGGA 10500 

ATGCTCAAGC TTCAGGTGAA GGGTTAACAC ATGAGCAGTG CGTTGATAAC TGGAAATCTT 10560 

TTGCTGGAGT GAAATTTGGC TGCTTTATGC CTCTTAGCCT GGATAGTGAT CTGACAATGT 10620 

ACCTAAAGGA CAAGGCACTT GCTGCTCTCC AAAGGGAATG GGATTCAGTT TACCCGAAAG 10680 

AGTTCCTGCG TTACGACCCT CCCAAGGGAA CCGGGTCACG GAGGCTTGTA GATGTTTTCC 10740 

TTAATGATTC GAGCTTTGAC CCATATGATG TGATAATGTA TGTTGTAAGT GGAGCTTACC 10800 

TCCATGACCC TGAGTTCAAC CTGTCTTACA GCCTGAAAGA AAAGGAGATC AAGGAAACAG 10860 

GTAGACTTTT TGCTAAAATG ACTTACAAAA TGAGGGCATG CCAAGTGATT GCTGAAAATC 10920 

TAATCTCAAA CGGGATTGGC AAATATTTTA AGGACAATGG GATGGCCAAG GATGAGCACG 10980 

ATTTGACTAA GGCACTCCAC ACTCTAGCTG TCTCAGGAGT CCCCAAAGAT CTCAAAGAAA 11040 

GTCACAGGGG GGGGCCAGTC TTAAAAACCT ACTCCCGAAG CCCAGTCCAC ACAAGTACCA 11100 



AGCCCTTTCT 



ACAGTCAAGA CTGAGATGAG GTCAGTGATT AAATCACAAA 
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6GAACGTGAG AGCAGCAAAA GGGTTTATAG GGTTCCCTCA AGTAATTCGG CAGGACCAAG 
ACACTGATCA TCCGGAGAAT ATGGAAGCTT ACGAGACAGT CAGTGCATTT ATCACGACTG 
ATCTCAAGAA GTACTGCCTT AATTGGAGAT ATGAGACCAT CAGCTTGTTT GCACAGAGGC 
TAAATGAGAT TTACGGATTG CCCTCATTTT TCCAGTGGCT GCATAAGAGG CTTGAGACCT 
CTGTCCTGTA TGTAAGTGAC CCTCATTGCC CCCCCGACCT TGACGCCCAT ATCCCGTTAT 
ATAAAGTCCC CAATGATCAA ATCTTCATTA AGTACCCTAT GGGAGGTATA GAAGGGTATT 
GTCAGAAGCT GTGGACCATC AGCACCATTC CCTATCTATA CCTGGCTGCT TATGAGAGCG 
GAGTAAGGAT TGCTTCGTTA GTGCAAGGGG ACAATCAGAC CATAGCCGTA ACAAAAAGGG 
TACCCAGCAC ATGGCCCTAC AACCTTAAGA AACGGGAAGC TGCTAGAGTA ACTAGAGATT 
ACTTTGTAAT TCTTAGGCAA AGGCTACATG ATATTGGCCA TCACCTCAAG GCAAATGAGA 
CAATTGTTTC ATCACATTTT TTTGTCTATT CAAAAGGAAT ATATTATGAT GGGCTACTTG 
TGTCCCAATC ACTCAAGAGC ATCGCAAGAT GTGTATTCTG GTCAGAGACT ATAGTTGATG 
AAACAAGGGC AGCATGCAGT AATATTGCTA CAACAATGGC TAAAAGCATC GAGAGAGGTT 
ATGACCGTTA CCTTGCATAT TCCCTGAACG TCCTAAAAGT GATACAGCAA ATTCTGATCT 
CTCTTGGCTT CACAATCAAT TCAACCATGA CCCGGGATGT AGTCATACCC CTCCTCACAA 
ACAACGACCT CTTAATAAGG ATGGCACTGT TGCCCGCTCC TATTGGGGGG ATGAATTATC 
TGAATATGAG CAGGCTGTTT GTCAGAAACA TCGGTGATCC AGTAACATCA TCAATTGCTG 
ATCTCAAGAG AATGATTCTC GCCTCACTAA TGCCTGAAGA GACCCTCCAT CAAGTAATGA 
CACAACAACC GGGGGACTCT TCATTCCTAG ACTGGGCTAG CGACCCTTAC TCAGCAAATC 
TTGTATGTGT CCAGAGCATC ACTAGACTCC TCAAGAACAT AACTGCAAGG TTTGTCCTGA 
TCCATAGTCC AAACCCAATG TTAAAAGGAT TATTCCATGA TGACAGTAAA GAAGAGGACG 
AGGGACTGGC GGCATTCCTC ATGGACAGGC ATATTATAGT ACCTAGGGCA GCTCATGAAA 
TCCTGGATCA TAGTGTCACA GGGGCAAGAG AGTCTATTGC AGGCATGCTG GATACCACAA 
AAGGCCTGAT TCGAGCCAGC ATGAGGAAGG GGGGGTTAAC CTCTCGAGTG ATAACCAGAT 
TGTCCAATTA TGACTATGAA CAATTCAGAG CAGGGATGGT GCTATTGACA GGAAGAAAGA 
GAAATGTCCT CATTGACAAA GAGTCATGTT CAGTGCAGCT GGCGAGAGCT CTAAGAAGCC 
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ATATGTG6GC GAGGCTAGCT CGAGGACGGC CTATTTACGG CCTTGAGGTC CCTGATGTAC 
TAGAATCTAT GCGAGGCCAC CTTATTCGGC GTCATGAGAC ATGTGTCATC TGCGAGTGTG 
GATCAGTCAA CTACGGATGG TTTTTTGTCC CCTCGGGTTG CCAACTGGAT GATATTGACA 
AGGAAACATC ATCCTTGAGA GTCCCATATA TTGGTTCTAC CACTGATGAG AGAACAGACA 
TGAAGCTTGC CTTCGTAAGA GCCCCAAGTC GATCCTTGCG ATCTGCTGTT AGAATAGCAA 
CAGTGTACTC ATGGGCTTAC GGTGATGATG ATAGCTCTTG GAACGAAGCC TGGTTGTTGG 
CTAGGCAAAG GGCCAATGTG AGCCTGGAGG AGCTAAGGGT GATCACTCCC ATCTCAACTT 
CGACTAATTT AGCGCATAGG TTGAGGGATC GTAGCACTCA AGTGAAATAC TCAGGTACAT 
CCCTTGTCCG AGTGGCGAGG TATACCACAA TCTCCAACGA CAATCTCTCA TTTGTCATAT 
CAGATAAGAA GGTTGATACT AACTTTATAT ACCAACAAGG AATGCTTCTA GGGTTGGGTG 
TTTTAGAAAC ATTGTTTCGA CTCGAGAAAG ATACCGGATC ATCTAACACG GTATTACATC 
TTCACGTCGA AACAGATTGT TGCGTGATCC CGATGATAGA TCATCCCAGG ATACCCAGCT 
CCCGCAAGCT AGAGCTGAGG GCAGAGCTAT GTACCAACCC ATTGATATAT GATAATGCAC 
CTTTAATTGA CAGAGATACA ACAAGGCTAT ACACCCAGAG CCATAGGAGG CACCTTGTGG 
AATTTGTTAC ATGGTCCACA CCCCAACTAT ATCACATTTT AGCTAAGTCC ACAGCACTAT 
CTATGATTGA CCTGGTAACA AAATTTGAGA AGGACCATAT GAATGAAATT TCAGCTCTCA 
TAGGGGATGA CGATATCAAT AGTTTCATAA CTGAGTTTCT GCTCATAGAG CCAAGATTAT 
TCACTATCTA CTTGGGCCAG TGTGCGGCCA TCAATTGGGC ATTTGATGTA CATTATCATA 
GACCATCAGG GAAATATCAG ATGGGTGAGC TGTTGTCATC GTTCCTTTCT AGAATGAGCA 
AAGGAGTGTT TAAGGTGCTT GTCAATGCTC TAAGCCACCC AAAGATCTAC AAGAAATTCT 
GGCATTGTGG TATTATAGAG CCTATCCATG GTCCTTCACT TGATGCTCAA AACTTGCACA 
CAACTGTGTG CAACATGGTT TACACATGCT ATATGACCTA CCTCGACCTG TTGTTGAATG 
AAGAGTTAGA AGAGTTCACA TTTCTCTTGT GTGAAAGCGA CGAGGATGTA GTACCGGACA 
GATTCGACAA CATCCAGGCA AAACACTTAT GTGTTCTGGC AGATTTGTAC TGTCAACCAG 
GGGCCTGCCC ACCAATTCGA GGTCTAAGAC CGGTAGAGAA ATGTGCAGTT CTAACCGACC 
ATATCAAGGC AGAGGCTAGG TTATCTCCAG CAGGATCTTC GTGGAACATA AATCCAATTA 
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TTGTAGACCA TTACTCATGC TCTCTGACTT ATCTCCGGCG AGGATCGATC AAACAGATAA 
GATTGAGAGT TGATCCAGGA TTCATTTTCG ACGCCCTCGC TGAGGTAAAT GTCAGTCAGC 
CAAAGATCGG CAGCAACAAC ATCTCAAATA TGAGCATCAA GGCTTTCAGA CCCCCACACG 
ATGATGTTGC AAAATTGCTC AAAGATATCA ACACAAGCAA GCACAATCTT CCCATTTCAG 
GGGGCAATCT CGCCAATTAT GAAATCCATG CTTTCCGCAG AATCGGGTTG AACTCATCTG 
CTTGCTACAA AGCTGTTGAG ATATCAACAT TAATTAGGAG ATGCCTTGAG CCAGGGGAGG 
ACGGCTTGTT CTTGGGTGAG GGATCGGGTT CTATGTTGAT CACTTATAAG GAGATACTTA 
AACTAAACAA GTGCTTCTAT AATAGTGGGG TTTCCGCCAA TTCTAGATCT GGTCAAAGGG 
AATTAGCACC CTATCCCTCC GAAGTTGGCC TTGTCGAACA CAGAATGGGA GTAGGTAATA 
TTGTCAAAGT GCTCTTTAAC GGGAGGCCCG AAGTCACGTG GGTAGGCAGT GTAGATTGCT 
TCAATTTCAT AGTTAGTAAT ATCCCTACCT CTAGTGTGGG GTTTATCCAT TCAGATATAG 
AGACCTTGCC TAACAAAGAT ACTATAGAGA AGCTAGAGGA ATTGGCAGCC ATCTTATCGA 
TGGCTCTGCT CCTGGGCAAA ATAGGATCAA TACTGGTGAT TAAGCTTATG CCTTTCAGCG 
GGGATTTTGT TCAGGGATTT ATAAGTTATG TAGGGTCTTA TTATAGAGAA GTGAACCTTG 
TATACCCTAG ATACAGCAAC TTCATATCTA CTGAATCTTA TTTGGTTATG ACAGATCTCA 
AGGCTAACCG GCTAATGAAT CCTGAAAAGA TTAAGCAGCA GATAATTGAA TCATCTGTGA 
GGACTTCACC TGGACTTATA GG TCACATCC TATCCATTAA GCAACTAAGC TGCATACAAG 
CAATTGTGGG AGACGCAGTT AGTAGAGGTG ATATCAATCC TACTCTGAAA AAACTTACAC 
CTATAGAGCA GGTGCTGATC AATTGCGGGT TGGCAATTAA CGGACCTAAG CTGTGCAAAG 
AATTGATCCA CCATGATGTT GCCTCAGGGC AAGATGGATT GCTTAATTCT ATACTCATCC 
TCTACAGGGA GTTGGCAAGA TTCAAAGACA ACCGAAGAAG TCAACAAGGG ATGTTCCACG 
CTTACCCCGT ATTGGTAAGT AGCAGGCAAC GAGAACTTAT ATCTAGGATC ACCCGCAAAT 
TTTGGGGGCA CATTCTTCTT TACTCCGGGA ACAGAAAGTT GATAAATAAG TTTATCCAGA 
ATCTCAAGTC CGGCTATCTG ATACTAGACT TACACCAGAA TATCTTCGTT AAGAATCTAT 
CCAAGTCAGA GAAACAGATT ATTATGACGG GGGGTTTGAA ACGTGAGTGG GTTTTTAAGG 
TAACAGTCAA GGAGACCAAA GAATGGTATA AGTTAGTCGG ATACAGTGCC CTGATTAAGG 



14280 
14340 
14400 
14460 
14520 
14580 
14640 
14700 
14760 
14820 
14880 
14940 
15000 
15060 
15120 
15180 
15240 
15300 
15360 
15420 
15480 
15540 
15600 
15660 
15720 
15780 



SUBSTITUTE SHEET (RULE 26) 



WO 98/13501 PCT/US97/16718 



- 228 - 



ACTAATTGAT T6AACTCC66 AACCCTAATC CTGCCCTAGG TGGTTAGGCA TTATTTGCAA 15840 
TATATTAAAG AAAACTTTGA AAATACGAAG TTTCTATTCC CAGCTTTGTC TGGT 15894 
(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2183 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 

Met Asp Ser Leu Ser Val Asn Gin lie Leu Tyr Pro Glu Val His Leu 
15 10 15 

Asp Ser Pro lie Val Thr Asn Lys lie Val Ala lie Leu Glu Tyr Ala 
20 25 30 

Arg Val Pro His Ala Tyr Ser Leu Glu Asp Pro Thr Leu Cys Gin Asn 
35 40 45 

lie Lys His Arg Leu Lys Asn Gly Phe Ser Asn Gin Met lie lie Asn 
50 55 60 

Asn Val Glu Val Gly Asn Val lie Lys Ser Lys Leu Arg Ser Tyr Pro 
65 70 75 80 

Ala His Ser His lie Pro Tyr Pro Asn Cys Asn Gin Asp Leu Phe Asn 
85 90 95 

lie Glu Asp Lys Glu Ser Thr Arg Lys lie Arg Glu Leu Leu Lys Lys 
100 105 110 

Gly Asn Ser Leu Tyr Ser Lys Val Ser Asp Lys Val Phe Gin Cys Leu 
115 120 125 

Arg Asp Thr Asn Ser Arg Leu Gly Leu Gly Ser Glu Leu Arg Glu Asp 
130 135 140 

lie Lys Glu Lys Val lie Asn Leu Gly Val Tyr Met His Ser Ser Gin 
145 150 155 160 

Trp Phe Glu Pro Phe Leu Phe Trp Phe Thr Val Lys Thr Glu Met Arg 
165 170 175 
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Ser Val He Lys Ser Gin Thr Hie Thr Cys His Arg Arg Arg His Thr 
180 185 190 

Pro Val Phe Phe Thr Gly Ser Ser Val Glu Leu Leu He Ser Arg Asp 
195 200 205 

Leu Val Ala He He Ser Lys Glu Ser Gin His Val Tyr Tyr Leu Thr 
210 215 220 

Phe Glu Leu Val Leu Met Tyr Cys Asp Val He Glu Gly Arg Leu Met 
225 230 235 240 

Thr Glu Thr Ala Met Thr He Asp Ala Arg Tyr Thr Glu Leu Leu Gly 
245 250 255 

Arg Val Arg Tyr Met Trp Lys Leu He Asp Gly Phe Phe Pro Ala Leu 
260 265 270 

Gly Asn Pro Thr Tyr Gin He Val Ala Met Leu Glu Pro Leu Ser Leu 
275 280 285 

Ala Tyr Leu Gin Leu Arg Asp He Thr Val Glu Leu Arg Gly Ala Phe 
290 295 300 

Leu Asn His Cys Phe Thr Glu He His Asp Val Leu Asp Gin Asn Gly 
305 310 315 320 

Phe Ser Asp Glu Gly Thr Tyr His Glu Leu He Glu Ala Leu Asp Tyr 
325 330 335 

He Phe He Thr Asp Asp He His Leu Thr Gly Glu He Phe Ser Phe 
340 345 350 

Phe Arg Ser Phe Gly His Pro Arg Leu Glu Ala Val Thr Ala Ala Glu 
355 360 365 

Asn Val Arg Lys Tyr Met Asn Gin Pro Lys Val He Val Tyr Glu Thr 
370 375 380 

Leu Met Lys Gly His Ala He Phe Cys Gly He He He Asn Gly Tyr 
385 390 395 400 

Arg Asp Arg His Gly Gly Ser Trp Pro Pro Leu Thr Leu Pro Leu His 
405 410 415 

Ala Ala Asp Thr He Arg Asn Ala Gin Ala Ser Gly Glu Gly Leu Thr 
420 425 430 

His Glu Gin Cys Val Asp Asn Trp Lys Ser Phe Ala Gly Val Lys Phe 
435 440 445 

Gly Cys Phe Met Pro Leu Ser Leu Asp Ser Asp Leu Thr Met Tyr Leu 
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450 455 460 

Lys Asp Lys Ala Leu Ala Ala Leu Gin Arg Glu Trp Asp Ser Val Tyr 
465 470 475 480 

Pro Lys Glu Phe Leu Arg Tyr Asp Pro Pro Lys Gly Thr Gly Ser Arg 
485 490 495 

Arg Leu Val Asp Val Phe Leu Asn Asp Ser Ser Phe Asp Pro Tyr Asp 
500 505 510 

Val lie Met Tyr Val Val Ser Gly Ala Tyr Leu His Asp Pro Glu Phe 
515 520 525 

Asn Leu Ser Tyr Ser Leu Lys Glu Lys Glu lie Lys Glu Thr Gly Arg 
530 535 540 

Leu Phe Ala Lys Met Thr Tyr Lys Met Arg Ala Cys Gin Val lie Ala 
545 550 555 560 

Glu Asn Leu lie Ser Asn Gly lie Gly Lys Tyr Phe Lys Asp Asn Gly 
565 570 575 

Met Ala Lys Asp Glu His Asp Leu Thr Lys Ala Leu His Thr Leu Ala 
580 585 590 

Val Ser Gly Val Pro Lys Asp Leu Lys Glu Ser His Arg Gly Gly Pro 
595 600 605 

Val Leu Lys Thr Tyr Ser Arg Ser Pro Val His Thr Ser Thr Arg Asn 
610 615 620 

Val Arg Ala Ala Lys Gly Phe lie Gly Phe Pro Gin Val He Arg Gin 
625 630 635 640 

Asp Gin Asp Thr Asp His Pro Glu Asn Met Glu Ala Tyr Glu Thr Val 
645 650 655 

Ser Ala Phe He Thr Thr Asp Leu Lys Lys Tyr Cys Leu Asn Trp Arg 
660 665 670 

Tyr Glu Thr He Ser Leu Phe Ala Gin Arg Leu Asn Glu He Tyr Gly 
675 680 685 

Leu Pro Ser Phe Phe Gin Trp Leu His Lys Arg Leu Glu Thr Ser Val 
690 695 700 

Leu Tyr Val Ser Asp Pro His Cys Pro Pro Asp Leu Asp Ala His He 
705 710 715 720 

Pro Leu Tyr Lys Val Pro Asn Asp Gin He Phe He Lys Tyr Pro Met 
725 730 735 
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Gly Gly He GXu Gly Tyr Cys Gin Lys Leu Trp Thr He Ser Thr He 
740 745 750 

Pro Tyr Leu Tyr Leu Ala Ala Tyr Glu Ser Gly Val Arg He Ala Ser 
755 760 765 

Leu Val Gin Gly Asp Asn Gin Thr He Ala Val Thr Lys Arg Val Pro 
770 775 780 

Ser Thr Trp Pro Tyr Asn Leu Lys Lys Arg Glu Ala Ala Arg Val Thr 
785 790 795 800 

Arg Asp Tyr Phe Val He Leu Arg Gin Arg Leu His Asp He Gly His 
805 810 815 

His Leu Lys Ala Asn Glu Thr He Val Ser Ser His Phe Phe Val Tyr 
820 825 830 

Ser Lys Gly He Tyr Tyr Asp Gly Leu Leu Val Ser Gin Ser Leu Lys 
835 840 845 

Ser He Ala Arg Cys Val Phe Trp Ser Glu Thr He Val Asp Glu Thr 
850 855 860 

Arg Ala Ala Cys Ser Asn He Ala Thr Thr Met Ala Lys Ser He Glu 
865 870 875 880 

Arg Gly Tyr Asp Arg Tyr Leu Ala Tyr Ser Leu Asn Val Leu Lys Val 
885 890 895 

He Gin Gin He Leu He Ser Leu Gly Phe Thr He Asn Ser Thr Met 
900 905 910 

Thr Arg Asp Val Val He Pro Leu Leu Thr Asn Asn Asp Leu Leu He 
915 920 925 

Arg Met Ala Leu Leu Pro Ala Pro He Gly Gly Met Asn Tyr Leu Asn 
930 935 940 

Met Ser Arg Leu Phe Val Arg Asn He Gly Asp Pro Val Thr Ser Ser 
345 950 955 960 

He Ala Asp Leu Lys Arg Met He Leu Ala Ser Leu Met Pro Glu Glu 
965 970 975 

Thr Leu His Gin Val Met Thr Gin Gin Pro Gly Asp Ser Ser Phe Leu 
980 985 990 

Asp Trp Ala Ser Asp Pro Tyr Ser Ala Asn Leu Val Cys Val Gin Ser 
995 1000 1005 
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lie Thr Arg Leu Leu Lys Aen lie Thr Ala Arg Phe Val Leu lie His 
1010 1015 1020 

Ser Pro Asn Pro Met Leu Lys Gly Leu Phe His Asp Asp Ser Lys Glu 
1025 1030 1035 1040 

Glu Asp Glu Gly Leu Ala Ala Phe Leu Met Asp Arg His lie He Val 
1045 1050 1055 

Pro Arg Ala Ala His Glu He Leu Asp His Ser Val Thr Gly Ala Arg 
1060 1065 1070 

Glu Ser lie Ala Gly Met Leu Asp Thr Thr Lys Gly Leu He Arg Ala 
1075 1080 * 1085 

Ser Met Arg Lys Gly Gly Leu Thr Ser Arg Val He Thr Arg Leu Ser 
1090 1095 1100 

Asn Tyr Asp Tyr Glu Gin Phe Arg Ala Gly Met Val Leu Leu Thr Gly 
1105 1110 1115 1120 

Arg Lys Arg Asn Val Leu He Asp Lys Glu Ser Cys Ser Val Gin Leu 
1125 1130 1135 

Ala Arg Ala Leu Arg Ser His Met Trp Ala Arg Leu Ala Arg Gly Arg 
1140 1145 1150 

Pro He Tyr Gly Leu Glu Val Pro Asp Val Leu Glu Ser Met Arg Gly 
1155 1160 1165 

His Leu He Arg Arg His Glu Thr Cys Val He Cys Glu Cys Gly Ser 
1170 1175 1180 

Val Asn Tyr Gly Trp Phe Phe Val Pro Ser Gly Cys Gin Leu Asp Asp 
1185 1190 1195 1200 

He Asp Lys Glu Thr Ser Ser Leu Arg Val Pro Tyr He Gly Ser Thr 
1205 1210 1215 

Thr Asp Glu Arg Thr Asp Met Lys Leu Ala Phe Val Arg Ala Pro Ser 
1220 1225 1230 

Arg Ser Leu Arg Ser Ala Val Arg He Ala Thr Val Tyr Ser Trp Ala 
1235 1240 1245 

Tyr Gly Asp Asp Asp Ser Ser Trp Asn Glu Ala Trp Leu Leu Ala Arg 
1250 1255 1260 

Gin Arg Ala Asn Val Ser Leu Glu Glu Leu Arg Val He Thr Pro He 
1265 1270 1275 1280 

Ser Thr Ser Thr Asn Leu Ala His Arg Leu Arg Asp Arg Ser Thr Gin 
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1285 1290 1295 

Val Lys Tyr Ser Gly Thr Ser Leu Val Arg Val Ala Arg Tyr Thr Thr 
1300 1305 1310 

lie Ser Asn Asp Asn Leu Ser Phe Val lie Ser Aap Lys Lys Val Asp 
1315 1320 1325 

Thr Asn Phe lie Tyr Gin Gin Gly Met Leu Leu Gly Leu Gly Val Leu 
1330 1335 1340 

Glu Thr Leu Phe Arg Leu Glu Lys Asp Thr Gly Ser Ser Asn Thr Val 
1345 1350 1355 1360 

Leu His Leu His Val Glu Thr Asp Cys Cys Val lie Pro Met lie Asp 
1365 1370 1375 

His Pro Arg lie Pro Ser Ser Arg Lys Leu Glu Leu Arg Ala Glu Leu 
1380 1385 1390 

Cys Thr Asn Pro Leu He Tyr Asp Asn Ala Pro Leu He Asp Arg Asp 
1395 1400 1405 

Thr Thr Arg Leu Tyr Thr Gin Ser His Arg Arg His Leu Val Glu Phe 
1410 1415 1420 

Val Thr Trp Ser Thr Pro Gin Leu Tyr His He Leu Ala Lys Ser Thr 
1425 1430 1435 1440 

Ala Leu Ser Met He Asp Leu Val Thr Lys Phe Glu Lys Asp His Met 
1445 1450 1455 

Asn Glu He Ser Ala Leu He Gly Asp Asp Asp He Asn Ser Phe He 
1460 1465 1470 

Thr Glu Phe Leu Leu He Glu Pro Arg Leu Phe Thr He Tyr Leu Gly 
1475 1480 1485 

Gin Cys Ala Ala He Asn Trp Ala Phe Asp Val His Tyr His Arg Pro 
1490 1495 1500 

Ser Gly Lys Tyr Gin Met Gly Glu Leu Leu Ser Ser Phe Leu Ser Arg 
1505 1510 1515 1520 

Met Ser Lys Gly Val Phe Lys Val Leu Val Asn Ala Leu Ser His Pro 
1525 1530 1535 

Lys He Tyr Lys Lys Phe Trp His Cys Gly He He Glu Pro He His 
1540 1545 1550 

Gly Pro Ser Leu Asp Ala Gin Asn Leu His Thr Thr Val Cys Asn Met 
1555 1560 1565 
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Val Tyr Thr Cys Tyr Met Thr Tyr Leu Asp Leu Leu Leu Asn Glu Glu 
1570 1575 1580 

Leu Glu Glu Phe Thr Phe Leu Leu Cys Glu Ser Asp Glu Asp Val Val 
1585 1590 1595 1600 

Pro Asp Arg Phe Asp Asn He Gin Ala Lys His Leu Cys Val Leu Ala 
1605 1610 1615 

Asp Leu Tyr Cys Gin Pro Gly Ala Cys Pro Pro He Arg Gly Leu Arg 
1620 1625 1630 

Pro Val Glu Lys Cys Ala Val Leu Thr Asp His He Lys Ala Glu Ala 
1635 1640 1645 

Arg Leu Ser Pro Ala Gly Ser Ser Trp Asn He Asn Pro He He Val 
1650 1655 1660 

Asp His Tyr Ser Cys Ser Leu Thr Tyr Leu Arg Arg Gly Ser He Lys 
1665 1670 1675 1680 

Gin He Arg Leu Arg Val Asp Pro Gly Phe He Phe Asp Ala Leu Ala 
1685 1690 1695 

Glu Val Asn Val Ser Gin Pro Lys He Gly Ser Asn Asn He Ser Asn 
1700 1705 1710 

Met Ser He Lys Ala Phe Arg Pro Pro His Asp Asp Val Ala Lys Leu 
1715 1720 1725 

Leu Lys Asp He Asn Thr Ser Lys His Asn Leu Pro He Ser Gly Gly 
1730 1735 1740 

Asn Leu Ala Asn Tyr Glu He His Ala Phe Arg Arg He Gly Leu Asn 
1745 1750 1755 1760 

Ser Ser Ala Cys Tyr Lys Ala Val Glu He Ser Thr Leu He Arg Arg 
1765 1770 1775 

Cys Leu Glu Pro Gly Glu Asp Gly Leu Phe Leu Gly Glu Gly Ser Gly 
1780 1785 1790 

Ser Met Leu He Thr Tyr Lys Glu He Leu Lys Leu Asn Lys Cys Phe 
1795 1800 1805 

Tyr Asn Ser Gly Val Ser Ala Asn Ser Arg Ser Gly Gin Arg Glu Leu 
1810 1815 1820 

Ala Pro Tyr Pro Ser Glu Val Gly Leu Val Glu His Arg Met Gly Val 
1825 1830 1835 1840 
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Gly Asn lie Val Lys Val Leu Phe Asn Gly Arg Pro Glu Val Thr Trp 
1845 1850 1855 

Val Gly Ser Val Asp Cys Phe Asn Phe lie Val Ser Asn He Pro Thr 
1860 1865 1870 

Ser Ser Val Gly Phe He His Ser Asp He Glu Thr Leu Pro Asn Lys 
1875 1880 1885 



Asp Thr He Glu Lys Leu Glu Glu 
1890 1895 

Leu Leu Leu Gly Lys He Gly Ser 
1905 1910 

Phe Ser Gly Asp Phe Val Gin Gly 
1925 

Tyr Arg Glu Val Asn Leu Val Tyr 
1940 



Leu Ala Ala He Leu Ser Met Ala 
1900 

He Leu Val He Lys Leu Met Pro 
1915 1920 

Phe He Ser Tyr Val Gly Ser Tyr 
1930 1935 

Pro Arg Tyr Ser Asn Phe He Ser 
1945 1950 



Thr Glu Ser Tyr Leu Val Met Thr Asp Leu Lys Ala Asn Arg Leu Met 
1955 1960 1965 

Asn Pro Glu Lys He Lys Gin Gin lie lie Glu Ser Ser Val Arg Thr 
1970 1975 1980 

Ser Pro Gly Leu He Gly His He Leu Ser He Lys Gin Leu Ser Cys 
1985 1990 1995 2000 

lie Gin Ala He Val Gly Asp Ala Val Ser Arg Gly Aep He Asn Pro 
2005 2010 2015 

Thr Leu Lys Lys Leu Thr Pro lie Glu Gin Val Leu He Asn Cys Gly 
2020 2025 2030 

Leu Ala lie Asn Gly Pro Lys Leu Cys Lys Glu Leu lie His His Asp 
2035 2040 2045 

Val Ala Ser Gly Gin Asp Gly Leu Leu Asn Ser lie Leu He Leu Tyr 
2050 2055 2060 

Arg Glu Leu Ala Arg Phe Lys Asp Asn Arg Arg Ser Gin Gin Gly Met 
2065 2070 2075 2080 

Phe His Ala Tyr Pro Val Leu Val Ser Ser Arg Gin Arg Glu Leu lie 
2085 2090 2095 

Ser Arg He Thr Arg Lys Phe Trp Gly His He Leu Leu Tyr Ser Gly 
2100 2105 2110 

Asn Arg Lys Leu He Asn Lys Phe He Gin Asn Leu Lys Ser Gly Tyr 
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2115 



2120 



2125 



Leu lie Leu Asp Leu His Gin Asn He Phe Val Lys Asn Leu Ser Lys 
2130 2135 2140 



Ser Glu Lys Gin He He Met Thr Gly Gly Leu Lys Arg Glu Trp Val 
2145 2150 2155 2160 



Phe Lys Val Thr Val Lys Glu Thr Lys Glu Trp Tyr Lys Leu Val Gly 
2165 2170 2175 



Tyr Ser Ala Leu He Lys Asp 
2180 



(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15462 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:17z 

ACCAAACAAG AGAAGAAACT TGTCTGGGAA TATAAATTTA ACTTTAAATT AACTTAGGAT 60 

TAAAGACATT GACTAGAAGG TCAAGAAAAG GGAACTCTAT AATTTCAAAA ATGTTGAGCC 120 

TATTTGATAC ATTTAATGCA CGTAGGCAAG AAAACATAAC AAAATCAGCC GGTGGAGCTA 180 

TCATTCCTGG ACAGAAAAAT ACTGTCTCTA TATTCGCCCT TGGACCGACA ATAACTGATG 240 

ATAATGAGAA AATGACATTA GCTCTTCTAT TTCTATCTCA TTCACTAGAT AATGAGAAAC 300 

AACATGCACA AAGGGCAGGG TTCTTGGTGT CTTTATTGTC AATGGCTTAT GCCAATCCAG 360 

AGCTCTACCT AACAACAAAT GGAAGTAATG CAGATGTCAA GTATGTCATA TACATGATTG 420 

AGAAAGATCT AAAACGGCAA AAGTATGGAG GATTTGTGGT TAAGACGAGA GAGATGATAT 480 

ATGAAAAGAC AACTGATTGG ATATTTGGAA GTGACCTGGA TTATGATCAG GAAACTATGT 540 

TGCAGAACGG CAGGAACAAT TCAACAATTG AAGACCTTGT CCACACATTT GGGTATCCAT 600 

CATGTTTAGG AGCTCTTATA ATACAGATCT GGATAGTTCT GGTCAAAGCT ATCACTAGTA 660 

TCTCAGGGTT AAGAAAAGGC TTTTTCACCC GATTGGAAGC TTTCAGACAA GATGGAACAG 720 
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T6CAGGCAG6 GCTGGTATTG AGCGGTGACA CAGTGGATCA GATTGGGTCA ATCATGCGGT 
CTCAACAGAG CTTGGTAACT CTTATGGTTG AAACATTAAT AACAATGAAT ACCAGCAGAA 
ATGACCTCAC AACCATAGAA AAGAATATAC AAATTGTTGG CAACTACATA AGAGATGCAG 
GTCTCGCTTC ATTCTTCAAT ACAATCAGAT ATGGAATTGA GACCAGAATG GCAGCTTTGA 
CTCTATCCAC TCTCAGACCA GATATCAATA GATTAAAAGC TTTGATGGAA CTGTATTTAT 
CAAAGGGACC ACGCGCTCCT TTCATCTGTA TCCTCAGAGA TCCTATACAT GGTGAGTTCG 
CACCAGGCAA CTATCCTGCC ATATGGAGCT ATGCAATGGG GGTGGCAGTT GTACAAAATA 
GAGCCATGCA ACAGTATGTG ACGGGAAGAT CATATCTAGA CATTGATATG TTCCAGCTAG 
GACAAGCAGT AGCACGTGAT GCCGAAGCTC AAATGAGCTC AACACTGGAA GATGAACTTG 
GAGTGACACA CGAATCTAAA GAAAGCTTGA AGAGACATAT AAGGAACATA AACAGTTCAG 
AGACATCTTT CCACAAACCG ACAGGTGGAT CAGCCATAGA GATGGCAATA GATGAAGAGC 
CAGAACAATT CGAACATAGA GCAGATCAAG AACAAAATGG AGAACCTCAA TCATCCATAA 
TTCAATATGC CTGGGCAGAA GGAAATAGAA GCGATGATCA GACTGAGCAA GCTACAGAAT 
CTGACAATAT CAAGACCGAA CAACAAAACA TCAGAGACAG ACTAAACAAG AGACTCAACG 
ACAAGAAGAA ACAAAGCAGT CAACCACCCA CTAATCCCAC AAACAGAACA AACCAGGACG 
AAATAGATGA TCTGTTTAAC GCATTTGGAA GCAACTAATC GAATCAACAT TTTAATCTAA 
ATCAATAATA AATAAGAAAA ACTTAGGATT AAAGAATCCT ATCATACCGG AATATAGGGT 
GGTAAATTTA GAGTCTGCTT GAAACTCAAT CAATAGAGAG TTGATGGAAA GCGATGCTAA 
AAACTATCAA ATCATGGATT CTTGGGAAGA GGAATCAAGA GATAAATCAA CTAATATCTC 
CTCGGCCCTC AACATCATTG AATTCATACT CAGCACCGAC CCCCAAGAAG ACTTATCGGA 
AAACGACACA ATCAACACAA GAACCCAGCA ACTCAGTGCC ACCATCTGTC AACCAGAAAT 
CAAACCAACA GAAACAAGTG AGAAAGATAG TGGATCAACT GACAAAAATA GACAGTCCGG 
GTCATCACAC GAATGTACAA CAGAAGCAAA AGATAGAAAT ATTGATCAGG AAACTGTACA 
GAGAGGACCT GGGAGAAGAA GCAGCTCAGA TAGTAGAGCT GAGACTGTGG TCTCTGGAGG 
AATCCCCAGA AGCATCACAG ATTCTAAAAA TGGAACCCAA AACACGGAGG ATATTGATCT 
CAATGAAATT AGAAAGATGG ATAAGGACTC TATTGAGGGG AAAATG CGAC AATCTGCAAA 
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T6TTCCAAGC GAGATATCAG GAAGTGATGA CATATTTACA ACAGAACAAA GTAGAAACAG 
TGATCATGGA AGAAGCCTGG AATCTATCAG TACACCTGAT ACAAGATCAA TAAGTGTTGT 
TACTGCTGCA ACACCAGATG ATGAAGAAGA AATACTAATG AAAAATAGTA GGACAAAGAA 
AAGTTCTTCA ACACATCAAG AAGATGACAA AAGAATTAAA AAAGGGGGAA AAGGGAAAGA 
CTGGTTTAAG AAATCAAAAG ATACCGACAA CCAGATACCA ACATCAGACT ACAGATCCAC 
ATCAAAAGGG CAGAAGAAAA TCTCAAAGAC AACAACCACC AACACCGACA CAAAGGGGCA 
AACAGAAATA CAGACAGAAT CATCAGAAAC ACAATCCTCA TCATGGAATC TCATCATCGA 
CAACAACACC GACCGGAACG AACAGACAAG CACAACTCCT CCAACAACAA CTTCCAGATC 
AACTTATACA AAAGAATCGA TCCGAACAAA CTCTGAATCC AAACCCAAGA CACAAAAGAC 
AAATGGAAAG GAAAGGAAGG ATACAGAAGA GAGCAATCGA TTTACAGAGA GGGCAATTAC 
TCTATTGCAG AATCTTGGTG TAATTCAATC CACATCAAAA CTAGATTTAT ATCAAGACAA 
ACGAGTTGTA TGTGTAGCAA ATGTACTAAA CAATGTAGAT ACTGCATCAA AGATAGATTT 
CCTGGCAGGA TTAGTCATAG GGGTTTCAAT GGACAACGAC ACAAAATTAA CACAGATACA 
AAATGAAATG CTAAACCTCA AAGCAGATCT AAAGAAAATG GACGAATCAC ATAGAAGATT 
GATAGAAAAT CAAAGAGAAC AACTGTCATT GATCACGTCA CTAATTTCAA ATCTCAAAAT 
TATGACTGAG AGAGGAGGAA AGAAAGACCA AAATGAATCC AATGAGAGAG TATCCATGAT 
CAAAACAAAA TTGAAAGAAG AAAAGATCAA GAAGACCAGG TTTGACCCAC TTATGGAGGC 
ACAAGGCATT GACAAGAATA TACCCGATCT ATATCGACAT GCAGGAGATA CACTAGAGAA 
CGATGTACAA GTTAAATCAG AGATATTAAG TTCATACAAT GAGTCAAATG CAACAAGACT 
AATACCCAAA AAAGTGAGCA GTACAATGAG ATCACTAGTT GCAGTCATCA ACAACAGCAA 
TCTCTCACAA AGCACAAAAC AATCATACAT AAACGAACTC AAACGTTGCA AAAATGATGA 
AGAAGTATCT GAATTAATGG ACATGTTCAA TGAAGATGTC AACAATTGCC AATGATCCAA 
CAAAGAAACG ACACCGAACA AACAGACAAG AAACAACAGT AGATCAAAAC CTGTCAACAC 
ACACAAAATC AAGCAGAATG AAACAACAGA TAT CAATCAA TATACAAATA AGAAAAACTT 
AGGATTAAAG AATAAATTAA TCCTTGTCCA AAATGAGTAT AACTAACTCT GCAATATACA 
CATTCCCAGA ATCATCATTC TCTGAAAATG GTCATATAGA ACCATTACCA CTCAAAGTCA 
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ATGAACAGAG GAAAGCAGTA CCCCACATTA GAGTTGCCAA GATCGGAAAT CCACCAAAAC 
ACGGATCCCG GTATTTAGAT GTCTTCTTAC TCGGCTTCTT CGAGATGGAA CGAATCAAAG 
ACAAATACGG GAGTGTGAAT GATCTCGACA GTGACCCGAG TTACAAAGTT TGTGGCTCTG 
GATCATTACC AATCGGATTG GCTAAGTACA CTGGGAATGA CCAGGAATTG TTACAAGCCG 
CAACCAAACT GGATATAGAA GTGAGAAGAA CAGTCAAAGC GAAAGAGATG GTTGTTTACA 
CGGTACAAAA TATAAAACCA GAACTGTACC CATGGTCCAA TAGACTAAGA AAAGGAATGC 
TGTTCGATGC CAACAAAGTT GCTCTTGCTC CTCAATGTCT TCCACTAGAT AGGAGCATAA 
AATTTAGAGT AATCTTCGTG AATTGTACGG CAATTGGATC AATAACCTTG TTCAAAATTC 
CTAAGTCAAT GGCATCACTA TCTCTACCCA ACACAATATC AATCAATCTG CAGGTACACA 
TAAAAACAGG GGTTCAGACT GATTCTAAAG GGATAGTTCA AATTTTGGAT GAGAAAGGCG 
AAAAATCACT GAATTTCATG GTCCATCTCG GATTGATCAA AAGAAAAGTA GGCAGAATGT 
ACTCTGTTGA ATACTGTAAA CAGAAAATCG AGAAAATGAG ATTGATATTT TCTTTAGGAC 
TAGTTGGAGG AATCAGTCTT CATGTCAATG CAACTGGGTC CATATCAAAA ACACTAGCAA 
GTCAGCTGGT ATTCAAAAGA GAGATTTGTT ATCCTTTAAT GGATCTAAAT CCGCATCTCA 
ATCTAGTTAT CTGGGCTTCA TCAGTAGAGA TTACAAGAGT GGATGCAATT TTCCAACCTT 
CTTTACCTGG CGAGTTCAGA TACTATCCTA ATATTATTGC AAAAGGAGTT GGGAAAATCA 
AACAATGGAA CTAGTAATCT CTATTTTAGT CCGGACGTAT CTATTAAGCC GAAGCAAATA 
AAGGATAATC AAAAACTTAG GACAAAAGAG GTCAATACCA ACAACTATTA GCAGTCACAC 
TCGCAAGAAT AAGAGAGAAG GGACCAAAAA AGTCAAATAG GAGAAATCAA AACAAAAGGT 
ACAGAACACC AGAACAACAA AATCAAAACA TCCAACTCAC TCAAAACAAA AATTCCAAAA 
GAGACCGGCA ACACAACAAG CACTGAACAC AATGCCAACT TCAATACTGC TAATTATTAC 
AACCATGATC ATGGCATCTT TCTGCCAAAT AGATATCACA AAACTACAGC ACGTAGGTGT 
ATTGGTCAAC AGTCCCAAAG GGATGAAGAT ATCACAAAAC TTTGAAACAA GATATCTAAT 
TTTGAGCCTC ATACCAAAAA TAGAAGACTC TAACTCTTGT GGTGACCAAC AGATCAAGCA 
ATACAAGAAG TTATTGQATA GACTGATCAT CCCTTTATAT GATGGATTAA GATTACAGAA 
AGATGTGATA GTAACCAATC AAGAATC CAA TGAAAACACT GATCCCAGAA CAAAACGATT 
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CTTTGGAGGO 6TAATTGGAA CCATTGCTCT GGGAGTAGCA ACCTCAGCAC AAATTACAGC 
GGCAGTTGCT CTGGTTGAAG CCAAGCAGGC AAGATCAGAC ATCGAAAAAC TCAAAGAAGC 
AATTAGGGAC ACAAACAAAG CAGTGCAGTC AGTTCAGAGC TCCATAGGAA ATTTAATAGT 
AGCAATTAAA TCAGTCCAGG ATTATGTTAA CAAAGAAATC GTGCCATCGA TTGCGAGGCT 
AGGTTGTGAA GCAGCAGGAC TTCAATTAGG AATTGCATTA ACACAGCATT ACTCAGAATT 
AACAAACATA TTTGGTGATA AC AT AGG AT C GTTACAAGAA AAAGGAATAA AATTACAAGG 
TATAGCATCA TTATACCGCA CAAATATCAC AGAAATATTC ACAACATCAA CAGTTGATAA 
ATATGATATC TATGATCTGT TATTTACAGA ATCAATAAAG GTGAGAGTTA TAGATGTTGA 
CTTGAATGAT TACTCAATCA CCCTCCAAGT CAGACTCCCT TTATTAACTA GGCTGCTGAA 
CACTCAGATC TACAAAGTAG ATTCCATATC ATATAACATC CAAAACAGAG AATGGTATAT 
CCCTCTTCCC AGCCATATCA TGACGAAAGG GGCATTTCTA GGTGGAGCAG ACGTCAAAGA 
ATGTATAGAA GCATTCAGCA GCTATATATG CCCTTCTGAT CCAGGATTTG TATTAAACCA 
TGAAATAGAG AGCTGCTTAT CAGGAAACAT ATCCCAATGT CCAAGAACAA CGGTCACATC 
AGACATTGTT CCAAGATATG CATTTGTCAA TGGAGGAGTG GTTGCAAACT GTATAACAAC 
CACCTGTACA TGCAACGGAA TTGGTAATAG AATCAATCAA CCACCTGATC AAGGAGTAAA 
AATTATAACA CATAAAGAAT GTAGTACAAT AGGTATCAAC GGAATGCTGT TCAATACAAA 
TAAAGAAGGA ACTCTTGCAT TCTATACACC AAATGATATA ACACTAAACA ATTCTGTTGC 
ACTTGATCCA ATTGACATAT CAATCGAGCT CAACAAGGCC AAATCAGATC TAGAAGAATC 
AAAAGAATGG ATAAGAAGGT CAAATCAAAA ACTAGATTCT ATTGGAAATT GGCATCAATC 
TAGCACTACA ATCATAATTA TTTTGATAAT GATCATTATA TTGTTTATAA TTAATATAAC 
GATAATTACA ATTGCAATTA AGTATTACAG AATTCAAAAG AGAAATCGAG TGGATCAAAA 
TGACAAGCCA TATGTACTAA CAAACAAATA ACATATCTAC AGATCATTAG ATATTAAAAT 
TATAAAAAAC TTAGGAGTAA AGTTACGCAA TCCAACTCTA CTCATATAAT TGAGGAAGGA 
CCCAATAGAC AAATCCAAAT TCGAGATGGA ATACTGGAAG CATACCAATC ACGGAAAGGA 
TGCTGGTAAT GAGCTGGAGA CGTCTATGGC TACTCATGGC AACAAGCTCA CTAATAAGAT 
AATATACATA TTATGGACAA TAATCCTGGT GTTATTATCA ATAGTCTTCA TCATAGTGCT 
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AATTAATTCC ATCAAAAGTG AAAAGGCCCA CGAATCATTG CTGCAAGACA TAAATAATGA 
GTTTATGGAA ATTACAGAAA AGATCCAAAT GGCATCGGAT AATACCAATG ATCTAATACA 
GTCAGGAGTG AATACAAGGC TTCTTACAAT TCAGAGTCAT GTCCAGAATT ACATACCAAT 
ATCATTGACA CAACAGATGT CAGATCTTAG GAAATTCATT AGTGAAATTA CAATTAGAAA 
TGATAATCAA GAAGTGCTGC CACAAAGAAT AACACATGAT GTAGGTATAA AACCTTTAAA 
TCCAGATGAT TTTTGGAGAT GCACGTCTGG TCTTCCATCT TTAATGAAAA CTCCAAAAAT 
AAGGTTAATG CCAGGGCCGG GATTATTAGC TATGCCAACG ACTGTTGATG GCTGTGTTAG 
AACTCCGTCT TTAGTTATAA ATGATCTGAT TTATGCTTAT ACCTCAAATC TAATTACTCG 
AGGTTGTCAG GATATAGGAA AATCATATCA AGTCTTACAG ATAGGGATAA TAACTGTAAA 
CTCAGACTTG GTACCTGACT TAAATCCTAG GATCTCTCAT ACCTTTAACA TAAATGACAA 
TAGGAAGTCA TGTTCTCTAG CACTCCTAAA TACAGATGTA TATCAACTGT GTTCAACTCC 
CAAAGTTGAT GAAAGATCAG ATTATGCATC ATCAGGCATA GAAGATATTG TACTTGATAT 
TGTCAATTAT GATGGTTCAA TCTCAACAAC AAGATTTAAG AATAATAACA TAAGCTTTGA 
TCAACCATAT GCTGCACTAT ACCCATCTGT TGGACCAGGG ATATACTACA AAGGCAAAAT 
AATATTTCTC GGGTATGGAG GTCTTGAACA TCCAATAAAT GAGAATGTAA TCTGCAACAC 
AACTGGGTGC CCCGGGAAAA CACAGAGAGA CTGTAATCAA GCGTCTCATA GTCCATGGTT 
TTCAGATAGG AGGATGGTCA ACTCCATCAT TGTTGTTGAC AAAGGCTTAA ACTCAATTCC 
AAAATTGAAA GTATGGACGA TATCTATGCG ACAAAATTAC TGGGGGTCAG AAGGAAGGTT 
ACTTCTACTA GGTAACAAGA TCTATATATA TACAAGATCT ACAAGTTGGC ATAGCAAGTT 
ACAATTAGGA ATAATTGATA TTACTGATTA CAGTGATATA AGGATAAAAT GGACATGGCA 
TAATGTGCTA TCAAGACCAG GAAACAATGA ATGTCCATGG GGACATTCAT GTCCAGATGG 
ATGTATAACA GGAGTATATA CTGATGCATA TCCACTCAAT CCCACAGGGA GCATTGTGTC 
ATCTGTCATA TTAGACTCAC AAAAATCGAG AGTGAACCCA GTCATAACTT ACTCAACAGC 
AACCGAAAGA GTAAACGAGC TGGCCATCCT AAACAGAACA CTCTCAGCTG GATATACAAC 
AACAAGCTGC ATTACACACT ATAACAAAGG ATATTGTTTT CATATAGTAG AAATAAATCA 
TAAAAGCTTA AACACATTTC AACCCATGTT GTTCAAAACA GAGATTCCAA AAAGCTGCAG 
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TTAATCATAA TTAACCATAA TATGCATCAA TCTATCTATA ATACAAGTAT ATGATAA6TA 
ATCAGCAATC AGACAATAGA CAAAAGGGAA ATATAAAAAA CTTAGGAGCA AAGCGTGCTC 
GGGAAATGGA CACTGAATCT AACAATGGCA CTGTATCTGA CATACTCTAT CCTGAGTGTC 
ACCTTAACTC TCCTATCGTT AAAGGTAAAA TAGCACAATT ACACACTATT ATGAGTCTAC 
CTCAGCCTTA TGATATGGAT GACGACTCAA TACTAGTTAT CACTAGACAG AAAATAAAAC 
TTAATAAATT GGATAAAAGA CAACGATCTA TTAGAAGATT AAAATTAATA TTAACTGAAA 
AAGTGAATGA CTTAGGAAAA TACACATTTA TCAGATATCC AGAAATGTCA AAAGAAATGT 
TCAAATTATA TATACCTGGT ATTAACAGTA AAGTGACTGA ATTATTACTT AAAGCAGATA 
GAACATATAG TCAAATGACT GATGGATTAA GAGATCTATG GATTAATGTG CTATCAAAAT 
TAGCCTCAAA AAATGATGGA AGCAATTATG AT CTTAATGA AGAAATTAAT AATATATCGA 
AAGTTCACAC AACCTATAAA TCAGATAAAT GGTATAATCC ATTCAAAACA TGGTTTACTA 
TCAAGTATGA TATGAGAAGA TTACAAAAAG CTCGAAATGA GATCACTTTT AATGTTGGGA 
AGGATTATAA CTTGTTAGAA GACCAGAAGA ATTTCTTATT GATACATCCA GAATTGGTTT 
TGATATTAGA TAAACAAAAC TATAATGGTT ATCTAATTAC TCCTGAATTA GTATTGATGT 
ATTGTGACGT AGTCGAAGGC CGATGGAATA TAAGTGCATG TGCTAAGTTA GATCCAAAAT 
TACAATCTAT GTATCAGAAA GGTAATAACC TGTGGGAAGT GATAGATAAA TTGTTTCCAA 
TTATGGGAGA AAAGACATTT GATGTGATAT CGTTATTAGA ACCACTTGCA TTATCCTTAA 
TTCAAACTCA TGATCCTGTT AAACAACTAA GAGGAGCTTT TTTAAATCAT GTGTTATCCG 
AGATGGAATT AATATTTGAA TCTAGAGAAT CGATTAAGGA ATTTCTGAGT GTAGATTACA 
TTGATAAAAT TTTAGATATA TTTAATAAGT CTACAATAGA TGAAATAGCA GAGATTTTCT 
CTTTTTTTAG AACATTTGGG CATCCTCCAT TAGAAGCTAG TATTGCAGCA GAAAAGGTTA 
GAAAATATAT GTATATTGGA AAACAATTAA AATTTGACAC TATTAATAAA TGTCATGCTA 
TCTTCTGTAC AATAATAATT AACGGATATA GAGAGAGGCA TGGTGGACAG TGGCCTCCTG 
TGACATTACC TGATCATGCA CACGAATTCA TCATAAATGC TTACGGTTCA AACTCTGCGA 
TAT CAT AT G A AAATGCTGTT GATTATTACC AGAGCTTTAT AGGAATAAAA TTCAATAAAT 
TCATAGAGCC TCAGTTAGAT GAGGATTTGA CAATTTATAT GAAAGATAAA GCATTATCTC 
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CAAAAAAATC AAATTGGGAC ACAGTTTATC CTGCATCTAA TTTACTGTAC CGTACTAACG 
CATCCAACGA ATCACGAAGA TTAGTTGAAG TATTTATAGC AGATAGTAAA TTTGATCCTC 
ATCAGATATT GGATTATGTA GAATCTGGGG ACTGGTTAGA TGATCCAGAA TTTAATATTT 
CTTATAGTCT TAAAGAAAAA GAGATCAAAC AGGAAGGTAG ACTCTTTGCA AAAATGACAT 
ACAAAATGAG AGCTACACAA GTTTTATCAG AGACACTACT TGCAAATAAC ATAGGAAAAT 
TCTTTCAAGA AAATGGGATG GTGAAGGGAG AGATTGAATT ACTTAAGAGA TTAACAACCA 
TATCAATATC AGGAGTTCCA CGGTATAATG AAGTGTACAA TAATTCTAAA AGCCATACAG 
ATGACCTTAA AACCTACAAT AAAATAAGTA ATCTTAATTT GTCTTCTAAT CAGAAATCAA 
AGAAATTTGA ATTCAAGTCA ACGGATATCT ACAATGATGG ATACGAGACT GTGAGCTGTT 
TCCTAACAAC AGATCTCAAA AAATACTGTC TTAATTGGAG ATATGAATCA ACAGCTCTAT 
TTGGAGAAAC TTGCAACCAA ATATTTGGAT TAAATAAATT GTTTAATTGG TTACACCCTC 
GTCTTGAAGG AAGTACAATC TATGTAGGTG ATCCTTACTG TCCTCCATCA GATAAAGAAC 
ATATATCATT AGAGGATCAC CCTGATTCTG GTTTTTACGT TCATAACCCA AGAGGGGGTA 
TAGAAGGATT TTGTCAAAAA TTATGGACAC TCATATCTAT AAGTGCAATA CATCTAGCAG 
CTGTTAGAAT AGGCGTGAGG GTGACTGCAA TGGTTCAAGG AGACAATCAA GCTATAGCTG 
TAACCACAAG AGTACCCAAC AATTATGACT ACAGAGTTAA GAAGGAGATA GTTTATAAAG 
ATGTAGTGAG ATTTTTTGAT TCATTAAGAG AAGTGATGGA TGATCTAGGT CATGAACTTA 
AATTAAATGA AACGATTATA AGTAGCAAGA TGTTCATATA TAGCAAAAGA ATCTATTATG 
ATGGGAGAAT TCTTCCTCAA GCTCTAAAAG CATTATCTAG ATGTGTCTTC TGGTCAGAGA 
CAGTAATAGA CGAAACAAGA TCAGCATCTT CAAATTTGGC AACATCATTT GCAAAAGCAA 
TTGAGAATGG TTATTCACCT GTTCTAGGAT ATGCATGCTC AATTTTTAAG AACATTCAAC 
AACTATATAT TGCCCTTGGG ATGAATATCA ATCCAACTAT AACACAGAAT ATCAGAGATC 
AGTATTTTAG GAATCCAAAT TGGATGCAAT ATGCCTCTTT AATACCTGCT AGTGTTGGGG 
GATTCAATTA CATGGCCATG TCAAGATGTT TTGTAAGGAA TATTGGTGAT CCATCAGTTG 
CCGCATTGGC TGATATTAAA AGATTTATTA AGGCGAATCT ATTAGACCGA AGTGTTCTTT 
ATAGGATTAT GAATCAAGAA CCAGGTGAGT CATCTTTTTT GGACTGGGCT TCAGATCCAT 
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ATTCATGCAA TTTACCACAA TCTCAAAATA TAACCACCAT GATAAAAAAT ATAACAGCAA 
GGAATGTATT ACAAGATTCA CCAAATCCAT TATTATCTGG ATTATTCACA AATACAATGA 
TAGAAGAAGA TGAAGAATTA GCTGAGTTCC TGATGGACAG GAAGGTAATT CTCCCTAGAG 
TTGCACATGA TATTCTAGAT AATTCTCTCA CAGGAATTAG AAATGCCATA GCTGGAATGT 
TAGATACGAC AAAATCACTA ATTCGGGTTG GCATAAATAG AGGAGGACTG ACATATAGTT 
TGTTGAGGAA AATCAGTAAT TACGATCTAG TACAATATGA AACACTAAGT AGGACTTTGC 
GACTAATTGT AAGTGATAAA ATCAAGTATG AAGATATGTG TTCGGTAGAC CTTGCCATAG 
CATTGCGACA AAAGATGTGG ATTCATTTAT CAGGAGGAAG GATGATAAGT GGACTTGAAA 
CGCCTGACCC ATTAGAATTA CTATCTGGGG TAGTAATAAC AGGATCAGAA CATTGTAAAA 
TATGTTATTC TTCAGATGGC ACAAACCCAT ATACTTGGAT GTATTTACCC GGTAATATCA 
AAATAGGATC AGCAGAAACA GGTATATCGT CATTAAGAGT TCCTTATTTT GGATCAGTCA 
CTGATGAAAG ATCTGAAGCA CAATTAGGAT ATATCAAGAA TCTTAGTAAA CCTGCAAAAG 
CCGCAATAAG AATAGCAATG ATATATACAT GGGCATTTGG TAATGATGAG ATATCTTGGA 
TGGAAGCCTC ACAGATAGCA CAAACACGTG CAAATTTTAC ACTAGATAGT CTCAAAATTT 
TAACACCGGT AGCTACATCA ACAAATTTAT CACACAGATT AAAGGATACT GCAACTCAGA 
TGAAATTCTC CAGTACATCA TTGATCAGAG TCAGCAGATT CATAACAATG TCCAATGATA 
ACATGTCTAT CAAAGAAGCT AATGAAACCA AAGATACTAA TCTTATTTAT CAACAAATAA 
TGTTAACAGG ATTAAGTGTT TTCGAATATT TATTTAGATT AAAAGAAACC ACAGGACACA 
ACCCTATAGT TATGCATCTG CACATAGAAG ATGAGTGTTG TATTAAAGAA AGTTTTAATG 
ATGAACATAT TAATCCAGAG TCTACATTAG AATTAATTCG ATATCCTGAA AGTAATGAAT 
TTATTTATGA TAAAGACCCA CTCAAAGATG TGGACTTATC AAAACTTATG GTTATTAAAG 
ACCATTCTTA CACAATTGAT ATGAATTATT GGGATGATAC TGACATCATA CATGCAATTT 
CAATATGTAC TGCAATTACA ATAGCAGATA CTATGTCACA ATTAGATCGA GATAATTTAA 
AAGAGATAAT AGTTATTGCA AATGATGATG ATATTAATAG CTTAATCACT GAATTTTTGA 
CTCTTGACAT ACTTGTATTT CTCAAGACAT TTGGTGGATT ATTAGTAAAT CAATTTGCAT 
ACACTCTTTA TAGTCTAAAA ATAGAAGGTA GGGATCTCAT TTGGGATTAT ATAATGAGAA 
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CACTGAGAGA TACTTCCCAT TCAATATTAA AAGTATTATC TAATGCATTA TCTCATCCTA 
AAGTATTCAA GAGGTTCTGG GATTGTGGAG TTTTAAACCC TATTTATGGT CCTAATACTG 
CTAGTCAAGA CCAGATAAAA CTTGCCCTAT CTATATGTGA ATATTCACTA GATCTATTTA 
TGAGAGAATG GTTGAATGGT GTATCACTTG AAATATACAT TTGTGACAGC GATATGGAAG 
TTGCAAATGA TAGGAAACAA GCCTTTATTT CTAGACACCT TTCATTTGTT TGTTGTTTAG 
CAGAAATTGC ATCTTTCGGA CCTAACCTGT TAAACTTAAC ATACTTGGAG AGACTTGATC 
TATTGAAACA ATATCTTGAA TTAAATATTA AAGAAGACCC TACTCTTAAA TATGTACAAA 
TATCTGGATT ATTAATTAAA TCGTTCCCAT CAACTGTAAC ATACGTAAGA AAGACTGCAA 
TCAAATATCT AAGGATTCGC GGTATTAGTC CACCTGAGGT AATTGATGAT TGGGATCCGG 
TAGAAGATGA AAATATGCTG GATAACATTG TCAAAACTAT AAATGATAAC TGTAATAAAG 
ATAATAAAGG GAATAAAATT AACAATTTCT GGGGACTAGC ACTTAAGAAC TATCAAGTCC 
TTAAAATCAG ATCTATAACA AGTGATTCTG ATGATAATGA TAGACTAGAT GCTAATACAA 
GTGGTTTGAC ACTTCCTCAA GGAGGGAATT ATCTATCGCA TCAATTGAGA TTATTCGGAA 
TCAACAGCAC TAGTTGTCTG AAAGCTCTTG AGTTATCACA AATTTTAATG AAGGAAGTCA 
ATAAAGACAA GGACAGGCTC TTCCTGGGAG AAGGAGCAGG AGCTATGCTA GCATGTTATG 
ATGCCACATT AGGACCTGCA GTTAATTATT ATAATTCAGG TTTGAATATA ACAGATGTAA 
TTGGTCAACG AGAATTGAAA ATATTTCCTT CAGAGGTATC ATTAGTAGGT AAAAAATTAG 
GAAATGTGAC ACAGATTCTT AACAGGGTAA AAGTACTGTT CAATGGGAAT CCTAATTCAA 
CATGGATAGG AAATATGGAA TGTGAGAGCT TAATATGGAG TGAATTAAAT GATAAGTCCA 
TTGGATTAGT ACATTGTGAT ATGGAAGGAG CTATCGGTAA ATCAGAAGAA ACTGTTCTAC 
ATGAACATTA TAGTGTTATA AGAATTACAT ACTTGATTGG GGATGATGAT GTTGTTTTAG 
TTTCCAAAAT TATACCTACA ATCACTCCGA ATTGGTCTAG AATACTTTAT CTATATAAAT 
TATATTGGAA AGATGTAAGT ATAATATCAC T CAAAACTTC TAATCCTGCA TCAACAGAAT 
TATATCTAAT TTCGAAAGAT GCATATTGTA CTATAATGGA ACCTAGTGAA ATTGTTTTAT 
CAAAACTTAA AAGATTGTCA CTCTTGGAAG AAAATAATCT ATTAAAATGG ATCATTTTAT 
CAAAGAAGAG GAATAATGAA TGGTTACATC ATGAAATCAA AGAAGGAGAA AGAGATTATG 
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13440 
13500 
13560 
13620 
13680 
13740 
13800 
13860 
13920 
13980 
14040 
14100 
14160 
14220 
14280 
14340 
14400 
14460 
14520 
14580 
14640 
14700 
14760 
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GAATCATGAG ACCATATCAT ATGGCACTAC AAATCTTTGG 
ATCTGGCGAA AGAATTTTTA TCAACCCCAG ATCTGACTAA 
GTTTTCAGCG AACAATAAAG GATGTTTTAT TTGAATGGAT 
AGAGACATAA ATTAGGCGGA AGATATAACA TATTCCCACT 
GACTGCTATC GAGAAGACTA GTATTAAGTT GGATTTCATT 
TTACAGGTCG CTTTCCTGAT GAAAAATTTG AACATAGAGC 
TAGCTGATAC TGATTTAGAA TCATTAAAGT TATTGTCGAA 
GAGAGTGTAT AGGATCAATA TCATATTGGT TTCTAACCAA 
AATTGATTGG TGGTGCTAAA TTATTAGGAA TTCCCAGACA 
AGTTATTAGA AAACTACAAT CAACATGATG AATTTGATAT 
TGAAGATATA TCCTAACCTT TATCTTTAAG CCTAGGAATA 
TGTAATATAT ATATACCAAA CAGAGTTCTT CTCTTGTTTG 
(2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2233 amino acids 

(B) TYPE : amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



ATTTCAAATC 
TATCAACAAT 
TAATATAACT 
GAAAAATAAG 
ATCATTATCG 
ACAGACTGGA 
AAACATCATT 
AGAAGTTAAA 
ATATAAAGAA 
CGATTAAAAC 
GACAAAAAGT 
GT 



AATTTAAATC 
ATAATCCAAA 
CATGATGATA 
GGAAAGTTAA 
ACTCGATTAC 
TATGTATCAT 
AAGAATTACA 
ATACTTATGA 
CCCGAAGACC 
ATAAATACAA 
AAGAAAAACA 



14820 
14880 
14940 
15000 
15060 
15120 
15180 
15240 
15300 
15360 
15420 
15462 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 

Met Asp Thr Glu Ser Asn Asn Gly Thr Val Ser Asp lie Leu Tyr Pro 
1 5 10 15 

Glu Cys His Leu Asn Ser Pro lie Val Lys Gly Lys He Ala Gin Leu 
20 25 30 

His Thr He Met Ser Leu Pro Gin Pro Tyr Asp Met Asp Asp Asp Ser 
35 40 45 

He Leu Val He Thr Arg Gin Lys He Lys Leu Asn Lys Leu Asp Lys 
50 55 60 
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Arg Gin Arg Ser lie Arg Arg Leu Lys Leu He Leu Thr Glu Lys Val 
65 70 75 80 

Asn Asp Leu Gly Lys Tyr Thr Phe He Arg Tyr Pro Glu Met Ser Lys 
85 90 95 

Glu Met Phe Lys Leu Tyr He Pro Gly He Asn Ser Lys Val Thr Glu 
100 105 HO 

Leu Leu Leu Lys Ala Asp Arg Thr Tyr Ser Gin Met Thr Asp Gly Leu 
115 120 125 

Arg Asp Leu Trp He Asn Val Leu Ser Lys Leu Ala Ser Lys Asn Asp 
130 135 140 

Gly Ser Asn Tyr Asp Leu Asn Glu Glu He Asn Asn He Ser Lys Val 
145 150 155 160 

His Thr Thr Tyr Lys Ser Asp Lys Trp Tyr Asn Pro Phe Lys Thr Trp 
165 170 175 

Phe Thr He Lys Tyr Asp Met Arg Arg Leu Gin Lys Ala Arg Asn Glu 
180 185 190 

He Thr Phe Asn Val Gly Lys Asp Tyr Asn Leu Leu Glu Asp Gin Lys 
195 200 205 

Asn Phe Leu Leu He His Pro Glu Leu Val Leu He Leu Asp Lys Gin 
210 215 220 

Asn Tyr Asn Gly Tyr Leu He Thr Pro Glu Leu Val Leu Met Tyr Cys 
225 230 235 240 

Asp Val Val Glu Gly Arg Trp Asn He Ser Ala Cys Ala Lys Leu Asp 
245 250 255 

Pro Lys Leu Gin Ser Met Tyr Gin Lys Gly Asn Asn Leu Trp Glu Val 
260 265 270 

He Asp Lys Leu Phe Pro He Met Gly Glu Lys Thr Phe Asp Val He 
275 280 285 

Ser Leu Leu Glu Pro Leu Ala Leu Ser Leu He Gin Thr His Asp Pro 
290 295 300 

Val Lys Gin Leu Arg Gly Ala Phe Leu Asn His Val Leu Ser Glu Met 
305 310 315 320 

Glu Leu He Phe Glu Ser Arg Glu Ser He Lys Glu Phe Leu Ser Val 
325 330 335 

Asp Tyr He Asp Lys He Leu Asp He Phe Asn Lye Ser Thr He Asp 
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340 345 350 

Glu He Ala Glu He Phe Ser Phe Phe Arg Thr Phe Gly His Pro Pro 
355 360 365 

Leu Glu Ala Ser He Ala Ala Glu Lys Val Arg Lys Tyr Met Tyr He 
370 375 380 

Gly Lys Gin Leu Lys Phe Asp Thr He Asn Lys Cys His Ala He Phe 
385 390 395 400 

Cys Thr He He He Asn Gly Tyr Arg Glu Arg His Gly Gly Gin Trp 
405 410 415 

Pro Pro Val Thr Leu Pro Asp His Ala His Glu Phe He He Asn Ala 
420 425 430 

Tyr Gly Ser Asn Ser Ala He Ser Tyr Glu Asn Ala Val Asp Tyr Tyr 
435 440 445 

Gin Ser Phe He Gly He Lys Phe Asn Lys Phe He Glu Pro Gin Leu 
450 455 460 

Asp Glu Asp Leu Thr He Tyr Met Lys Asp Lys Ala Leu Ser Pro Lys 
465 470 475 480 

Lys Ser Asn Trp Asp Thr Val Tyr Pro Ala Ser Asn Leu Leu Tyr Arg 
485 490 495 

Thr Asn Ala Ser Asn Glu Ser Arg Arg Leu Val Glu Val Phe He Ala 
500 505 510 

Asp Ser Lys Phe Asp Pro His Gin He Leu Asp Tyr Val Glu Ser Gly 
515 520 525 

Asp Trp Leu Asp Asp Pro Glu Phe Asn He Ser Tyr Ser Leu Lys Glu 
530 535 540 

Lys Glu He Lys Gin Glu Gly Arg Leu Phe Ala Lys Met Thr Tyr Lys 
545 550 555 560 

Met Arg Ala Thr Gin Val Leu Ser Glu Thr Leu Leu Ala Asn Asn He 
565 570 575 

Gly Lys Phe Phe Gin Glu Asn Gly Met Val Lys Gly Glu He Glu Leu 
580 585 590 

Leu Lys Arg Leu Thr Thr He Ser He Ser Gly Val Pro Arg Tyr Asn 
595 600 605 

Glu Val Tyr Asn Asn Ser Lys Ser His Thr Asp Asp Leu Lys Thr Tyr 
610 615 620 
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Asn Lys lie Ser Asn Leu Asn Leu Ser Ser Asn Gin Lys Ser Lys Lye 
625 630 635 640 

Phe Glu Phe Lys Ser Thr Asp lie Tyr Asn Asp Gly Tyr Glu Thr Val 
645 650 655 

Ser Cys Phe Leu Thr Thr Aep Leu Lys Lys Tyr Cys Leu Asn Trp Arg 
660 665 670 

Tyr Glu Ser Thr Ala Leu Phe Gly Glu Thr Cys Asn Gin lie Phe Gly 
675 680 685 

Leu Asn Lys Leu Phe Asn Trp Leu His Pro Arg Leu Glu Gly Ser Thr 
690 695 700 

He Tyr Val Gly Asp Pro Tyr Cys Pro Pro Ser Asp Lys Glu His He 
705 710 715 720 

Ser Leu Glu Asp His Pro Asp Ser Gly Phe Tyr Val His Asn Pro Arg 
725 730 735 

Gly Gly He Glu Gly Phe Cys Gin Lys Leu Trp Thr Leu He Ser He 
740 745 750 

Ser Ala He His Leu Ala Ala Val Arg He Gly Val Arg Val Thr Ala 
755 760 765 

Met Val Gin Gly Asp Asn Gin Ala He Ala Val Thr Thr Arg Val Pro 
770 775 780 

Asn Asn Tyr Asp Tyr Arg Val Lys Lys Glu He Val Tyr Lys Asp Val 
785 790 795 800 

Val Arg Phe Phe Asp Ser Leu Arg Glu Val Met Asp Asp Leu Gly His 
805 810 815 

Glu Leu Lys Leu Asn Glu Thr He He Ser Ser Lys Met Phe He Tyr 
820 825 830 

Ser Lys Arg He Tyr Tyr Asp Gly Arg He Leu Pro Gin Ala Leu Lys 
835 840 845 

Ala Leu Ser Arg Cys Val Phe Trp Ser Glu Thr Val He Asp Glu Thr 
850 855 860 

Arg Ser Ala Ser Ser Asn Leu Ala Thr Ser Phe Ala Lys Ala He Glu 
865 870 875 880 

Asn Gly Tyr Ser Pro Val Leu Gly Tyr Ala Cys Ser He Phe Lys Asn 
885 890 895 
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He Gin Gin Leu Tyr He Ala Leu Gly Met Asn He Asn Pro Thr He 
900 905 910 

Thr Gin Asn He Arg Asp Gin Tyr Phe Arg Aen Pro Asn Trp Met Gin 
915 920 925 

Tyr Ala Ser Leu He Pro Ala Ser Val Gly Gly Phe Asn Tyr Met Ala 
930 935 940 

Met Ser Arg Cys Phe Val Arg Asn He Gly Asp Pro Ser Val Ala Ala 
945 950 955 960 

Leu Ala Asp He Lys Arg Phe lie Lys Ala Asn Leu Leu Asp Arg Ser 
965 970 975 

Val Leu Tyr Arg He Met Asn Gin Glu Pro Gly Glu Ser Ser Phe Leu 
980 985 990 

Asp Trp Ala Ser Asp Pro Tyr Ser Cys Asn Leu Pro Gin Ser Gin Asn 
995 1000 1005 

He Thr Thr Met He Lys Asn He Thr Ala Arg Asn Val Leu Gin Asp 
1010 1015 1020 

Ser Pro Asn Pro Leu Leu Ser Gly Leu Phe Thr Asn Thr Met He Glu 
1025 1030 1035 1040 

Glu Asp Glu Glu Leu Ala Glu Phe Leu Met Asp Arg Lys Val He Leu 
1045 1050 1055 

Pro Arg Val Ala His Asp He Leu Asp Asn Ser Leu Thr Gly He Arg 
1060 1065 1070 

Asn Ala He Ala Gly Met Leu Asp Thr Thr Lys Ser Leu He Arg Val 
1075 1080 1085 

Gly He Asn Arg Gly Gly Leu Thr Tyr Ser Leu Leu Arg Lys He Ser 
1090 1095 HOO 

Asn Tyr Asp Leu Val Gin Tyr Glu Thr Leu Ser Arg Thr Leu Arg Leu 
1105 1110 1115 1120 

He Val Ser Asp Lys He Lys Tyr Glu Asp Met Cys Ser Val Asp Leu 
1125 1130 1135 

Ala He Ala Leu Arg Gin Lys Met Trp He His Leu Ser Gly Gly Arg 
1140 1145 1150 

Met He Ser Gly Leu Glu Thr Pro Asp Pro Leu Glu Leu Leu Ser Gly 
1155 1160 1165 

Val Val He Thr Gly Ser Glu His Cys Lys He Cys Tyr Ser Ser Asp 



SUBSTITUTE SHEET (RULE 26) 



WO 98/13501 PCT/US97/16718 



- 251 - 



1170 1175 1180 

Gly Thr Asn Pro Tyr Thr Trp Met Tyr Leu Pro Gly Asn lie Lys lie 
1185 1190 1195 1200 

Gly Ser Ala Glu Thr Gly lie Ser Ser Leu Arg Val Pro Tyr Phe Gly 
1205 1210 1215 

Ser Val Thr Asp Glu Arg Ser Glu Ala Gin Leu Gly Tyr lie Lys Asn 
1220 1225 1230 

Leu Ser Lys Pro Ala Lys Ala Ala lie Arg He Ala Met He Tyr Thr 
1235 1240 1245 

Trp Ala Phe Gly Asn Asp Glu He Ser Trp Met Glu Ala Ser Gin He 
1250 1255 1260 

Ala Gin Thr Arg Ala Asn Phe Thr Leu Asp Ser Leu Lys He Leu Thr 
1265 1270 1275 1280 

Pro Val Ala Thr Ser Thr Asn Leu Ser His Arg Leu Lys Asp Thr Ala 
1285 1290 1295 

Thr Gin Met Lys Phe Ser Ser Thr Ser Leu He Arg Val Ser Arg Phe 
1300 1305 1310 

He Thr Met Ser Asn Asp Asn Met Ser He Lys Glu Ala Asn Glu Thr 
1315 1320 1325 

Lys Asp Thr Asn Leu He Tyr Gin Gin He Met Leu Thr Gly Leu Ser 
1330 1335 1340 

Val Phe Glu Tyr Leu Phe Arg Leu Lys Glu Thr Thr Gly His Asn Pro 
1345 1350 1355 1360 

He Val Met His Leu His He Glu Asp Glu Cys Cys He Lys Glu Ser 
1365 1370 1375 

Phe Asn Asp Glu His He Asn Pro Glu Ser Thr Leu Glu Leu He Arg 
1380 1385 1390 

Tyr Pro Glu Ser Asn Glu Phe He Tyr Asp Lys Asp Pro Leu Lys Asp 
1395 1400 1405 

Val Asp Leu Ser Lys Leu Met Val He Lys Asp His Ser Tyr Thr He 
1410 1415 1420 

Asp Met Asn Tyr Trp Asp Asp Thr Asp He He His Ala He Ser He 
1425 1430 1435 1440 

Cys Thr Ala He Thr He Ala Asp Thr Met Ser Gin Leu Asp Arg Asp 
1445 1450 1455 



SUBSTITUTE SHEET (RULE 26) 



WO 98/13501 PCT/US97/16718 



252 - 



Asn Leu Lys Glu lie lie Val He Ala Asn Asp Asp Asp He Asn Ser 
1460 1465 1470 

Leu lie Thr Glu Phe Leu Thr Leu Asp He Leu Val Phe Leu Lys Thr 
1475 1480 1485 

Phe Gly Gly Leu Leu Val Asn Gin Phe Ala Tyr Thr Leu Tyr Ser Leu 
1490 1495 1500 

Lys He Glu Gly Arg Asp Leu He Trp Asp Tyr He Met Arg Thr Leu 
1505 1510 1515 1520 

Arg Asp Thr Ser His Ser He Leu Lys Val Leu Ser Asn Ala Leu Ser 
1525 1530 1535 

His Pro Lys Val Phe Lys Arg.. Phe Trp Asp Cys Gly Val Leu Asn Pro 
1540 1545 1550 

He Tyr Gly Pro Asn Thr Ala Ser Gin Asp Gin He Lys Leu Ala Leu 
1555 1560 1565 

Ser He Cys Glu Tyr Ser Leu Asp Leu Phe Met Arg Glu Trp Leu Asn 
1570 1575 1580 

Gly Val Ser Leu Glu He Tyr He Cys Asp Ser Asp Met Glu Val Ala 
1585 1590 1595 1600 

Asn Asp Arg Lys Gin Ala Phe He Ser Arg His Leu Ser Phe Val Cys 
1605 1610 1615 

Cys Leu Ala Glu He Ala Ser Phe Gly Pro Asn Leu Leu Asn Leu Thr 
1620 1625 1630 

Tyr Leu Glu Arg Leu Asp Leu Leu Lys Gin Tyr Leu Glu Leu Asn He 
1635 1640 1645 

Lys Glu Asp Pro Thr Leu Lys Tyr Val Gin He Ser Gly Leu Leu He 
1650 1655 1660 

Lys Ser Phe Pro Ser Thr Val Thr Tyr Val Arg Lys Thr Ala He Lys 
1665 1670 1675 1680 

Tyr Leu Arg He Arg Gly He Ser Pro Pro Glu Val He Asp Asp Trp 
1685 1690 1695 

Asp Pro Val Glu Asp Glu Asn Met Leu Asp Asn He Val Lys Thr He 
1700 1705 1710 

Asn Asp Asn Cys Asn Lys Asp Asn Lys Gly Asn Lys He Asn Asn Phe 
1715 1720 1725 
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Trp Gly Leu Ala Leu Lys Asn Tyr Gin Val Leu Lys He Arg Ser He 
1730 1735 1740 

Thr Ser Asp Ser Asp Asp Asn Asp Arg Leu Asp Ala Asn Thr Ser Gly 
1745 1750 1755 1760 

Leu Thr Leu Pro Gin Gly Gly Asn Tyr Leu Ser His Gin Leu Arg Leu 
1765 1770 1775 

Phe Gly He Asn Ser Thr Ser Cys Leu Lys Ala Leu Glu Leu Ser Gin 
1780 1785 1790 

He Leu Met Lys Glu Val Asn Lys Asp Lys Asp Arg Leu Phe Leu Gly 
1795 1800 1805 

Glu Gly Ala Gly Ala Met Leu Ala Cys Tyr Asp Ala Thr Leu Gly Pro 
1810 1815 1820 

Ala Val Asn Tyr Tyr Asn Ser Gly Leu Asn He Thr Asp Val He Gly 
1825 1830 1835 1840 

Gin Arg Glu Leu Lys He Phe Pro Ser Glu Val Ser Leu Val Gly Lys 
1845 1850 1855 

Lys Leu Gly Asn Val Thr Gin He Leu Asn Arg Val Lys Val Leu Phe 
1860 1865 1870 

Asn Gly Asn Pro Asn Ser Thr Trp He Gly Asn Met Glu Cys Glu Ser 
1875 1880 1885 

Leu He Trp Ser Glu Leu Asn Asp Lys Ser He Gly Leu Val His Cys 
1890 1895 1900 

Asp Met Glu Gly Ala He Gly Lys Ser Glu Glu Thr Val Leu His Glu 
1905 1910 1915 1920 

His Tyr Ser Val He Arg He Thr Tyr Leu He Gly Asp Asp Asp Val 
1925 1930 1935 

Val Leu Val Ser Lys He He Pro Thr He Thr Pro Asn Trp Ser Arg 
1940 1945 1950 

He Leu Tyr Leu Tyr Lys Leu Tyr Trp Lys Asp Val Ser He He Ser 
1955 1960 1965 

Leu Lys Thr Ser Asn Pro Ala Ser Thr Glu Leu Tyr Leu He Ser Lys 
1970 1975 1980 

Asp Ala Tyr Cys Thr He Met Glu Pro Ser Glu He Val Leu Ser Lys 
1985 1990 1995 2000 

Leu Lys Arg Leu Ser Leu Leu Glu Glu Asn Asn Leu Leu Lys Trp He 
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2005 2010 2015 

lie Leu Ser Lys Lys Arg Asn Asn Glu Trp Leu His His Glu He Lys 
2020 2025 2030 

Glu Gly Glu Arg Asp Tyr Gly He Met Arg Pro Tyr His Met Ala Leu 
2035 2040 2045 

Gin He Phe Gly Phe Gin He Asn Leu Asn His Leu Ala Lys Glu Phe 
2050 2055 2060 

Leu Ser Thr Pro Asp Leu Thr Asn He Asn Asn He He Gin Ser Phe 
2065 2070 2075 2080 

Gin Arg Thr He Lys Asp Val Leu Phe Glu Trp He Asn He Thr His 
2085 2090 2095 

Asp Asp Lys Arg His Lys Leu Gly Gly Arg Tyr Asn He Phe Pro Leu 
2100 2105 2110 

Lys Asn Lys Gly Lys Leu Arg Leu Leu Ser Arg Arg Leu Val Leu Ser 
2115 2120 2125 

Trp He Ser Leu Ser Leu Ser Thr Arg Leu Leu Thr Gly Arg Phe Pro 
2130 2135 2140 

Asp Glu Lys Phe Glu His Arg Ala Gin Thr Gly Tyr Val Ser Leu Ala 
2145 2150 2155 2160 

Asp Thr Asp Leu Glu Ser Leu Lys Leu Leu Ser Lys Asn He He Lys 
2165 2170 2175 

Asn Tyr Arg Glu Cys He Gly Ser He Ser Tyr Trp Phe Leu Thr Lys 
2180 2185 2190 

Glu Val Lys He Leu Met Lys Leu He Gly Gly Ala Lys Leu Leu Gly 
2195 2200 2205 

He Pro Arg Gin Tyr Lys Glu Pro Glu Asp Gin Leu Leu Glu Asn Tyr 
2210 2215 2220 

Asn Gin His Asp Glu Phe Asp He Asp 
2225 2230 

(2) INFORMATION FOR SBQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15462 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: RNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 
ACCAAACAAG AGAAGAAACT TGCTTGGTAA TATAAATTTA ACTTAAAATT AACTTAGGAT 
TTAAGACATT GACTAGAAGG TCAAGAAAAG GGAACTCTAT AATTTCAAAA ATGTTGAGCC 
TATTTGATAC ATTTAATGCA CGTAGGCAAG AAAACATAAC AAAATCAGCC GGTGGAGCTA 
TCATTCCTGG ACAGAAAAAT ACTGTCTCTA TATTCGCCCT TGGACCGACA ATAACTGATG 
ATAATGAGAA AATGACATTA GCTCTTCTAT TTCTATCTCA TTCACTAGAT AATGAGAAAC 
AACATGCACA AAGGGCAGGG TTCTTGGTGT CTTTATTGTC AATGGCTTAT GCCAATCCAG 
AGCTCTACCT AACAACAAAT GGAAGTAATG CAGATGCCAA GTATGTCATA TACATGATTG 
AGAAAGATCT AAAACGGCAA AAGTATGGAG GATTTGTGGT TAAGACGAGA GAGATGATAT 
ATGAAAAGAC AACTGATTGG ATATTTGGAA GTGACCTGGA TTATGATCAG GAAACTATGT 
TGCAGAACGG CAGGAACAAT TCAACAATTG AAGACCTTGT CCACACATTT GGGTATCCAT 
CATGTTTAGG AGCTCTTATA ATACAGATCT GGATAGTTCT GGTCAAAGCT ATCACTAGTA 
TCTCAGGGTT AAGAAAAGGC TTTTTCACCC GATTGGAAGC TTTCAGACAA GATGGAACAG 
TGCAGGCAGG GCTGGTATTG AGCGGTGACA CAGTGGATCA GATTGGGTCA ATCATGCGGT 
CTCAACAGAG CTTGGTAACT CTTATGGTTG AAACATTAAT AACAATGAAT ACCAGCAGAA 
ATGACCTCAC AACCATAGAA AAGAATATAC AAATTGTTGG CAACTACATA AGAGATGCAG 
GTCTCGCTTC ATTCTTCAAT ACAATCAGAT ATGGAATTGA GACCAGAATG GCAGCTTTGA 
CTCTATCCAC TCTCAGACCA GATATCAATA GATTAAAAGC TTTGATGGAA CTGTATTTAT 
CAAAGGGACC ACGCGCTCCT TTCATCTGTA TCCTCAGAGA TCCTATACAT GGTGAGTTCG 
CACCAGGCAA CTATCCTGCC ATATGGAGCT ATGCAATGGG GGTGGCAGTT GTACAAAATA 
GAGCCATGCA ACAGTATGTG ACGGGAAGAT CATATCTAGA CATTGATATG TTCCAGCTAG 
GACAAGCAGT AGCACGTGAT GCCGAAGCTC AAATGAGCTC AACACTGGAA GATGAACTTG 
GAGTGACACA CGAAGCTAAA GAAAGCTTGA AGAGACATAT AAGGAACATA AACAGTTCAG 
AGACATCTTT CCACAAACCG ACAGGTGGAT CAGCCATAGA GATGGCAATA GATGAAGAGC 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
1200 
1260 
1320 
1380 
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CAGAACAATT CGAACATAGA GCAGATCAAG AACAAAATGG AGAACCTCAA TCATCCATAA 
TTCAATATGC CTGGGCAGAA GGAAATAGAA GCGATGATCA GACTGAGCAA GCTACAGAAT 
CTGACAATAT CAAGACCGAA CAACAAAACA T CAG AG ACAG ACTAAACAAG AG ACT CAACG 
ACAAGAAGAA ACAAAGCAGT CAACCACCCA CTAATCCCAC AAACAGAACA AACCAGGACG 
AAATAGATGA TCTGTTTAAC GCATTTGGAA GCAACTAATC GAATCAACAT TTTAATCTAA 
ATCAATAATA AATAAGAAAA ACTTAGGATT AAAGAATCCT ATCATACCGG AATATAGGGT 
GGTAAATTTA GAGTCTGCTT GAAACTCAAT CAATAGAGAG TTGATGGAAA GCGATGCTAA 
AAACTATCAA ATCATGGATT CTTGGGAAGA GGAATCAAGA GATAAATCAA CTAATATCTC 
CTCGGCCCTC AACATCATTG AATTCATACT CAGCACCGAC CCCCAAGAAG ACTTATCGGA 
AAACGACACA ATCAACACAA GAACCCAGCA ACTCAGTGCC ACCATCTGTC AACCAGAAAT 
CAAACCAACA GAAACAAGTG AGAAAGATAG TGGATCAACT GACAAAAATA GACAGTCTGG 
GTCATCACAC GAATGTACAA CAGAAGCAAA AGATAGAAAC ATTGATCAGG AAACTGTACA 
GAGAGGACCT GGGAGAAGAA GCAGCTCAGA TAGTAGAGCT GAGACTGTGG TCTCTGGAGG 
AATCCCCAGA AGCATCACAG ATTCTAAAAA TGGAACCCAA AACACGGAGG ATATTGATCT 
CAATGAAATT AGAAAGATGG ATAAGGACTC TATTGAGGGG AAAATGCGAC AATCTGCAAA 
TGTTCCAAGC GAG AT AT CAG GAAGTGATGA CATATTTACA ACAGAACAAA GTAGAAACAG 
TGATCATGGA AGAAGCCTGG AATCTATCAG TACACCTGAT ACAAGATCAA TAAGTGTTGT 
TACTGCTGCA ACACCAGATG ATGAAGAAGA AATACTAATG AAAAATAGTA GGACAAAGAA 
AAGTTCTTCA ACACATCAAG AAGATGACAA AAGAATTAAA AAA GGGGGAA AAGGGAAAGA 
CTGGTTTAAG AAATCAAAAG ATACCGACAA CCAGATACCA ACATCAGACT ACAGATCCAC 
ATCAAAAGGG CAGAAGAAAA TCTCAAAGAC AACAACCACC AACACCGACA CAAAGGGGCA 
AACAGAAATA CAGACAGAAT CATCAGAAAC ACAATCCTCA TCATGGAATC TCATCATCGA 
CAACAACACC GACCGGAACG AACAGACAAG CACAACTCCT CCAACAACAA CTTCCAGATC 
AACTTATACA AAAGAATCGA TCCGAACAAA CTCTGAATCC AAACCCAAGA CACAAAAGAC 
AAATGGAAAG GAAAGGAAGG ATACAGAAGA GAGCAATCGA TTTACAGAGA GGGCAATTAC 
TCTATTGCAG AATCTTGGTG TAATTCAATC CACATCAAAA CTAGATTTAT ATCAAGACAA 



1440 
1500 
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ACGAGTTGTA TGTGTAGCAA ATGTACTAAA CAATGTAGAT ACTGCATCAA AGATAGATTT 
CCTGGCAGGA TTAGTCATAG GGGTTTCAAT GGACAACGAC ACAAAATTAA CACAGATACA 
AAATGAAATG CTAAACCTCA AAGCAGATCT AAAGAAAATG GACGAATCAC ATAGAAGATT 
GATAGAAAAT CAAAGAGAAC AACTGTCATT GATCACGTCA CTAATTTCAA ATCTCAAAAT 
TATGACTGAG AGAGGAGGAA AGAAAGACCA AAATGAATCC AATGAGAGAG TATCCATGAT 
CAAAACAAAA TTGAAAGAAG AAAAGATCAA GAAGACCAGG TTTGACCCAC TTATGGAGGC 
ACAAGGCATT GACAAGAATA TACCCGATCT ATATCGACAT GCAGGAGATA CACTAGAGAA 
CGATGTACAA GTTAAATCAG AGATATTAAG TTCATACAAT GAGTCAAATG CAACAAGACT 
AATACCCAAA AAAGTGAGCA GTACAATGAG ATCACTAGTT GCAGTCATCA ACAACAGCAA 
TCTCTCACAA AGCACAAAAC AATCATACAT AAACGAACTC AAACGTTGCA AAAATGATGA 
AGAAGTATCT GAATTAATGG ACATGTTCAA TGAAGATGTC AACAATTGCC AATGATCCAA 
CAAAGAAACG ACACCGAACA AACAGACAAG AAACAACAGT AGATCAAAAC CTGTCAACAC 
ACACAAAATC AAGCAGAATG AAACAACAGA TATCAATCAA TATACAAATA AGAAAAACTT 
AGGATTAAAG AATAAATTAA TCCTTGTCCA AAATGAGTAT AACTAACTCT GCAATATACA 
CATTCCCAGA ATCATCATTC TCTGAAAATG GTCATATAGA ACCATTACCA CTCAAAGTCA 
ATGAACAGAG GAAAGCAGTA CCCCACATTA GAGTTGCCAA GATCGGAAAT CCACCAAAAC 
ACGGATCCCG GTATTTAGAT GTCTTCTTAC TCGGCTTCTT CGAGATGGAA CGAATCAAAG 
ACAAATACGG GAGTGTGAAT GATCTCGACA GTGACCCGAG TTACAAAGTT TGTGGCTCTG 
GATCATTACC AATCGGATTG GCTAAGTACA CTGGGAATGA CCAGGAATTG TTACAAGCCG 
CAACCAAACT GGATATAGAA GTGAGAAGAA CAGTCAAAGC GAAAGAGATG GTTGTTTACA 
CGGTACAAAA TATAAAACCA GAACTGTACC CATGGTCCAA TAGACTAAGA AAAGGAATGC 
TGTTCGATGC CAACAAAGTT GCTCTTGCTC CTCAATGTCT TCCACTAGAT AGGAGCATAA 
AATTTAGAGT AATCTTCGTG AATTGTACGG CAATTGGATC AATAACCTTG TTCAAAATTC 
CTAAGTCAAT GG CATCACTA TCTCTAACCA ACACAATATC AATCAATCTG CAGGTACACA 
TAAAAACAGG GGTTCAGACT GATTCTAAAG GGATAGTTCA AATTTTGGAT GAGAAAGGCG 
AAAAATCACT GAATTTCATG GTCCATCTCG GATTGATCAA AAGAAAAGTA GGCAGAATGT 
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ACTCTGTTGA ATACTGTAAA CAGAAAATCG AGAAAATGAG ATTGATATTT TCTTTAGGAC 
TAGTTGGAGG AATCAGTCTT CATGT CAATG CAACTGGGTC CATATCAAAA ACACTAGCAA 
GTCAGCTGGT ATTCAAAAGA GAGATTTGTT ATCCTTTAAT GGATCTAAAT CCGCATCTCA 
ATCTAGTTAT CTGGGCTTCA TCAGTAGAGA TTACAAGAGT GGATGCAATT TTCCAACCTT 
CTTTACCTGG CGAGTTCAGA TACTATCCTA ATATTATTGC AAAAGGAGTT GGGAAAATCA 
AACAATGGAA CTAGTAATCT CTATTTTAGT CCGGACGTAT CTATTAAGCC GAAGCAAATA 
AAGGATAATC AAAAACTTAG GACAAAAGAG GTCAATACCA ACAACTATTA GCAGTCACAC 
TCGCAAGAAT AAGAGAGAAG GGACCAAAAA AGTCAAATAG GAGAAATCAA AACAAAAGGT 
ACAGAACACC AGAACAACAA AATCAAAACA TCCAACTCAC TCAAAACAAA AATTCCAAAA 
GAGACCGGCA ACACAACAAG CAC TGAACAC AATGCCAACT TCAATACTGC TAATTATTAC 
AACCATGATC ATGGCATCTT TCTGCCAAAT AGATATCACA AAACTACAGC ACGTAGGTGT 
ATTGGTCAAC AGTCCCAAAG GGATGAAGAT ATCACAAAAC TTTGAAACAA GATATCTAAT 
TTTGAGC CTC ATACCAAAAA TAGAAGACTC TAACTCTTGT GGT GACCAAC AGATCAAGCA 
ATACAAGAAG TTATTGGATA GACTGATCAT CCCTTTATAT GATGGATTAA GATTACAGAA 
AGATGTGATA GTAACCAATC AAGAATCCAA TGAAAACACT GATCCCAGAA CAAAACGATT 
CTTTGGAGGG GTAATTGGAA CCATTGCTCT GGGAGTAGCA ACCTCAGCAC AAATTACAGC 
GGCAGTTGCT CTGGTTGAAG CCAAGCAGGC AAGATCAGAC ATCGAAAAAC TCAAAGAAGC 
AATTAGGGAC ACAAATAAAG CAGTGCAGTC AGTTCAGAGC TCCATAGGAA ATTTAATAGT 
AGCAATTAAA TCAGTCCAGG ATTATGTTAA CAAAGAAATC GTGCCATCGA TTGCGAGGCT 
AGGTTGTGAA GCAGCAGGAC TTCAATTAGG AATTGCATTA ACACAGCATT ACTCAGAATT 
AACAAACATA TTTGGTGATA ACATAGGATC GTTACAAGAA AAAGGAATAA AATTACAAGG 
TATAGCATCA TTATACCGCA CAAATATCAC AGAAATATTC ACAACATCAA CAGTTGATAA 
ATATGATATC TATGATCTGT TATTTACAGA ATCAATAAAG GTGAGAGTTA TAGATGTTGA 
CTTGAATGAT TACTCAATCA CCCTCCAAGT CAGACTCCCT TTATTAACTA GGCTGCTGAA 
CACTCAGATC TACAAAGTAG ATTCCATATC ATATAACATC CAAAACAGAG AATGGTATAT 
CCCTCTTCCC AGCCATATCA TGACGAAAGG GGCATTTCTA GGTGGAGCAG ACGTCAAAGA 
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ATGTATAGAA GCATTCAGCA GCTATATATG CCCTTCTGAT CCAGGATTTG TATTAAACCA 
TGAAATAGAG AGCTGCTTAT CAGGAAACAT ATCCCAATGT CCAAGAACAA CGGTCACATC 
AGACATTGTT CCAAGATATG CATTTGTCAA TGGAGGAGTG GTTGCAAACT GTATAACAAC 
CACCTGTACA TGCAACGGAA TTGGTAATAG AATCAATCAA CCACCTGATC AAGGAGTAAA 
AATTATAACA CATAAAGAAT GTAGTACAGT AGGTATCAAC GGAATGCTGT TCAATACAAA 
TAAAGAAGGA ACTCTTGCAT TCTATACACC AAATGATATA ACACTAAACA ATTCTGTTAC 
ACTTGATCCA ATTGACATAT CAATCGAGCT CAACAAGGCC AAATCAGATC TAGAAGAATC 
AAAAGAATGG ATAAGAAGGT CAAATCAAAA ACTAGATTCT ATTGGAAATT GGCATCAATC 
TAGCACTACA ATCATAATTA TTTTGATAAT GATCATTATA TTGTTTATAA TTAATATAAC 
GATAATTACA ATTGCAATTA AGTATTACAG AATTCAAAAG AGAAATCGAG TGGATCAAAA 
TGACAAGCCA TATGTACTAA CAAACAAATA ACATATCTAC AGATCATTAG ATATTAAAAT 
TATAAAAAAC TTAGGAGTAA AGTTACGCAA TCCAACTCTA CTCATATAAT TGAGGAAGGA 
CCCAATAGAC AAATCCAAAT TCGAGATGGA ATACTGGAAG CATACCAATC ACGGAAAGGA 
TGCTGGCAAT GAGCTGGAGA CGTCTATGGC TACTCATGGC AACAAGCTCA CTAATAAGAT 
AATATACATA TTATGGACAA TAATCCTGGT GTTATTATCA ATAGTCTTCA TCATAGTGCT 
AATTAATTCC ATCAAAAGTG AAAAGGCCCA CGAATCATTG CTGCAAGACA TAAATAATGA 
GTTTATGGAA ATTACAGAAA AGATCCAAAT GGCATCGGAT AATACCAATG ATCTAATACA 
GTCAGGAGTG AATACAAGGC TTCTTACAAT TCAGAGTCAT GTCCAGAATT ACATACCAAT 
ATCATTGACA CAACAGATGT CAGATCTTAG GAAATTCATT AGTGAAATTA CAATTAGAAA 
TGATAATCAA GAAGTGCTGC CACAAAGAAT AACACATGAT GTAGGTATAA AACCTTTAAA 
TCCAGATGAT TTTTGGAGAT GCACGTCTGG TCTTCCATCT TTAATGAAAA CTCCAAAAAT 
AAGGTTAATG CCAGGGCCGG GATTATTAGC TATGCCAACG ACTGTTGATG GCTGTGTTAG 
AACTCCGTCT TTAGTTATAA ATGATCTGAT TTATGCTTAT ACCTCAAATC TAATTACTCG 
AGGTTGTCAG GATATAGGAA AATCATATCA AGTCTTACAG ATAGGGATAA TAACTGTAAA 
CTCAGACTTG GTACCTGACT TAAATCCTAG GATCTCTCAT ACCTTTAACA TAAATGACAA 
TAGGAAGTCA TGTTCTCTAG CACTCCTAAA TACAGATGTA TATCAACTGT GTTCAACTCC 
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CAAAGTTGAT GAAAGATCAG ATTATGCATC ATCAGGCATA GAAGATATTG TACTTGATAT 
TGTCAATTAT GATGGTTCAA TCTCAACAAC AAGATTTAAG AATAATAACA TAAGCTTTGA 
TCAACCATAT GCTGCACTAT ACCCATCTGT TGGACCAGGG ATATACTACA AAGGCAAAAT 
AATATTTCTC GGGTATGGAG GTCTTGAACA TCCAATAAAT GAGAATGTAA TCTGCAACAC 
AACTGGGTGC CCCGGGAAAA CACAGAGAGA CTGTAATCAA GCGTCTCATA GTCCATGGTT 
TTCAGATAGG AGGATGGTCA ACTCCATCAT TGTTGCTGAC AAAGGCTTAA ACTCAATTCC 
AAAATTGAAA GTATGGACGA TATCTATGCG ACAAAATTAC TGGGGGTCAG AAGGAAGGTT 
ACTTCTACTA GGTAACAAGA TCTATATATA TACAAGATCT ACAAGTTGGC ATAGCAAGTT 
ACAATTAGGA ATAATTGATA TTACTGATTA CAGTGATATA AGGATAAAAT GGACATGGCA 
TAATGTGCTA TGAAGACCAG GAAACAATGA ATGTCCATGG GGACATTCAT GTCCAGATGG 
ATGTATAACA GGAGTATATA CTGATGCATA TCCACTCAAT CCCACAGGGA GCATTGTGTC 
ATCTGTCATA TTAGACTCAC AAAAATCGAG AGTGAACCCA GTCATAACTT ACTCAACAGC 
AACCGAAAGA GTAAACGAGC TGGCCATCCT AAACAGAACA CTCTCAGCTG GATATACAAC 
AACAAGCTGC ATTACACACT ATAACAAAGG ATATTGTTTT CATATAGTAG AAATAAATCA 
TAAAAGCTTA AACACATTTC AACCCATGTT GTTCAAAACA GAGATTCCAA AAAGCTGCAG 
TTAATCATAA TTAACCATAA TATGCATCAA TCTATCTATA ATACAAGTAT ATGATAAGTA 
ATCAGCAATC AGACAATAGA CAAAAGGGAA ATATAAAAAA CTTAGGAGCA AAGCGTGCTC 
GGGAAATGGA CACTGAATCT AACAATGGCA CTGTATCTGA CATACTCTAT CCTGAGTGTC 
ACCTTAACTC TCCTATCGTT AAAGGTAAAA TAGCACAATT ACACACTATT ATGAGTCTAC 
CTCAGCCTTA TGATATGGAT GACGACTCAA TACTAGTTAT CACTAGACAG AAAATAAAAC 
TTAATAAATT GGATAAAAGA CAACGATCTA TTAGAAGATT AAAATTAATA TTAACTGAAA 
AAGTGAATGA CTTAGGAAAA TACACATTTA TCAGATATCC AGAAATGTCA AAAGAAATGT 
TCAAATTATA TATACCTGGT ATTAACAGTA AAGTGACTGA ATTATTACTT AAAGCAGATA 
GAACATATAG TCAAATGACT GATGGATTAA GAGATCTATG GATTAATGTG CTATCAAAAT 
TAGCCTCAAA AAATGATGGA AGCAATTATG ATCTTAATGA AGAAATTAAT AATATATCGA 
AAGTTCACAC AACCTATAAA TCAGATAAAT GGTATAATCC ATTCAAAACA TGGTTTACTA 
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TCAAGTATGA TATGAGAAGA TTACAAAAAG CTCGAAATGA GATCACTTTT AATGTTGGGA 
AGGATTATAA CTTGTTAGAA GACCAGAAGA ATTTCTTATT GATACATCCA GAATTGGTTT 
TGATATTAGA TAAACAAAAC TACAATGGTT ATCTAATTAC TCCTGAATTA GTATTGATGT 
ATTGTGACGT AGTCGAAGGC CGATGGAATA TAAGTGCATG TGCTAAGTTA GATCCAAAAT 
TACAATCTAT GTATCAGAAA GGTAATAACC TGTGGGAAGT GATAGATAAA TTGTTTCCAA 
TTATGGGAGA AAAGACATTT GATGTGATAT CGTTATTAGA ACCACTTGCA TTATCCTTAA 
TTCAAACTCA TGATCCTGTT AAACAACTAA GAGGAGCTTT TTTAAATCAT GTGTTATCCG 
AGATGGAATT AATATTTGAA TCTAGAGAAT CGATTAAGGA ATTTCTGAGT GTAGATTACA 
TTGATAAAAT TTTAGATATA TTTAATAAGT CTACAATAGA TGAAATAGCA GAGATTTTCT 
CTTTTTTTAG AACATTTGGG CATCCTCCAT TAGAAGCTAG TATTGCAGCA GAAAAGGTTA 
GAAAATATAT GTATATTGGA AAACAATTAA AATTTGACAC TATTAATAAA TGTCATGCTA 
TCTTCTGTAC AATAATAATT AACGGATATA GAGAGAGGCA TGGTGGACAG TGGCCTCCTG 
TGACATTACC TGATCATGCA CACGAATTCA TCATAAATGC TTACGGTTCA AACTCTGCGA 
TATCATATGA GAATGCTGTT GATTATTACC AGAGCTTTAT AGGAATAAAA TTCAATAAAT 
TCATAGAGCC TCAGTTAGAT GAGGATTTGA CAATTTATAT GAAAGATAAA GCATTATCTC 
CAAAAAAATC AAATTGGGAC ACAGTTTATC CTGCATCTAA TTTACTGTAC CGTACTAACG 
CATCCAACGA ATCACGAAGA TTAGTTGAAG TATTTATAGC AGATAGTAAA TTTGATCCTC 
ATCAGATATT GGATTATGTA GAATCTGGGG ACTGGTTAGA TGATCCAGAA TTTAATATTT 
CTTATAGTCT TAAAGAAAAA GAGATCAAAC AGGAAGGTAG ACTCTTTGCA AAAATGACAT 
ACAAAATGAG AGCTACACAA GTTTTATCAG AGACACTACT TGCAAATAAC ATAGGAAAAT 
TCTTTCAAGA AAATGGGATG GTGAAGGGAG AGATTGAATT ACTTAAGAGA TTAACAACCA 
TATCAATATC AGGAGTTCCA CGGTATAATG AAGTGTACAA TAATTCTAAA AGCCATACAG 
ATGACCTTAA AACCTACAAT AAAATAAGTA ATCTTAATTT GTCTTCTAAT CAGAAATCAA 
AGAAATTTGA ATTCAAGTCA ACGGATATCT ACAATGATGG ATACGAGACT GTGAGCTGTT 
TCCTAACAAC AGATCTCAAA AAATACTGTC TTAATTGGAG ATATGAATCA ACAGCTCTAT 
TTGGAGAAAC TTGCAACCAA ATATTTGGAT TAAATAAATT GTTTAATTGG TTACACCCTC 
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GTCTTGAAGG AAGTACAATC TATGTAGGTG ATCCTTACTG TCCTCCATCA GATAAAGAAC 
ATATATCATT AGAGGATCAC CCTGATTCTG GTTTTTACGT TCATAACCCA AGAGGGGGTA 
TAGAAGGATT TTGTCAAAAA TTATGGACAC TCATATCTAT AAGTGCAATA CATCTAGCAG 
CTGTTAGAAT AGGCGTGAGG GTGACTGCAA TGGTTCAAGG AGACAATCAA GCTATAGCTG 
TAACCACAAG AGTACCCAAC AATTATGACT ACAGAGTTAA GAAGGAGATA GTTTATAAAG 
ATGTAGTGAG ATTTTTTGAT TCATTAAGAG AAGTGATGGA TGATCTAGGT CATGAACTTA 
AATTAAATGA AACGATTATA AGTAGCAAGA TGTTCATATA TAGCAAAAGA ATCTATTATG 
ATGGGAGAAT TCTTCCTCAA GCTCTAAAAG CATTATCTAG ATGTGTCTTC TGGTCAGAGA 
CAGTAATAGA CGAAACAAGA TCAGCATCTT CAAATTTGGC AACATCATTT GCAAAAGCAA 
TTGAGAATGG TTATTCACCT GTTCTAGGAT ATGCATGCTC AATTTTTAAG AACATTCAAC 
AACTATATAT TGCCCTTGGG ATGAATATCA ATCCAACTAT AACACAGAAT ATCAGAGATC 
AGTATTTTAG GAATCCAAAT TGGATGCAAT ATGCCTCTTT AATACCTGCT AGTGTTGGGG 
GATTCAATCA CATGGCCATG TCAAGATGTT TTGTAAGGAA TATTGGTGAT CCATCAGTTG 
CCGCATTGGC TGATATTAAA AGATTTATTA AGGCGAATCT ATTAGACCGA AGTGTTCTTT 
ATAGGATTAT GAATCAAGAA CCAGGTGAGT CATCTTTTTT TGACTGGGCT TCAGATCCAT 
ATTCATGCAA TTTACCACAA TCTCAAAATA TAACCACCAT GATAAAAAAT ATAACAGCAA 
GGAATGTATT ACAAGATTCA CCAAATCCAT TATTATCTGG ATTATTCACA AATACAATGA 
TAGAAGAAGA TGAAGAATTA GCTGAGTTCC TGATGGACAG GAAGGTAATT CTCCCTAGAG 
TTGCACATGA TATTCTAGAT AATTCTCTCA CAGGAATTAG AAATGCCATA GCTGGAATGT 
TAGATACGAC AAAATCACTA ATTCGGGTTG GCATAAATAG AGGAGGACTG ACATATAGTT 
TGTTGAGGAA AATCAGTAAT TACGATCTAG TACAATATGA AACACTAAGT AGGACTTTGC 
GACTAATTGT AAGTGATAAA ATCAAGTATG AAGATATGTG TTCGGTAGAC CTTGCCATAG 
CATTGCGACA AAAGATGTGG ATTCATTTAT CAGGAGGAAG GATGATAAGT GGACTTGAAA 
CGCCTGACCC ATTAGAATTA CTATCTGGGG TAGTAATAAC AGGATCAGAA CATTGTAAAA 
TATGTTATTC TTCAGATGGC ACAAACCCAT ATACTTGGAT GTATTTACCC GGTAATATCA 
AAATAGGATC AGCAGAAACA GGTATATCGT CATTAAGAGT TCCTTATTTT GGATCAGTCA 
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CTGATGAAAG ATCTGAAGCA CAATTAGGAT ATATCAAGAA TCTTAGTAAA CCTGCAAAAG 
CCGCAATAAG AATAGCAATG ATATATACAT GGGCATTTGG TAATGATGAG ATATCTTGGA 
TGGAAGCCTC ACAGATAGCA CAAACACGTG CAAATTTTAC ACTAGATAGT CTCAAAATTT 
TAACACCGGT AGCTACATCA ACAAATTTAT CACACAGATT AAAGGATACT GCAACTCAGA 
TGAAATTCTC CAGTACATCA TTGATCAGAG TCAGCAGATT TATAACAATG TCCAATGATA 
ACATGTCTAT CAAAGAAGCT AATGAAACCA AAGATACTAA TCTTATTTAT CAACAAATAA 
TGTTAACAGG ATTAAGTGTT TTCGAATATT TATTTAGATT AAAAGAAACC ACAGGACACA 
AC CCTATAGT TATGCATCTG CACATAGAAG „ ATGAGTGTTG TATTAAAGAA AGTTTTAATG 
ATGAACATAT TAATCCAGAG TCTACATTAG AATTAATTCG ATATCCTGAA AGTAATGAAT 
TTATTTATGA TAAAGACCCA CTCAAAGATG TGGACTTATC AAAACTTATG GTTATTAAAG 
ACCATTCTTA CACAATTGAT ATGAATTATT GGGATGATAC TGACATCATA CATG CAATTT 
CAATATGTAC TGCAATTACA ATAGCAGATA CTATGTCACA ATTAGATCGA GATAATTTAA 
AAGAGATAAT AGTTATTGCA AATGATGATG ATATTAATAG CTTAATCACT GAATTTTTGA 
CTCTTGACAT ACTTGTATTT CTCAAGACAT TTGGTGGATT ATTAGTAAAT CAATTTGCAT 
ACACTCTTTA TAGTCTAAAA ATAGAAGGTA GGGATCTCAT TTGGGATTAT ATAATGAGAA 
CACTGAGAGA TACTTCCCAT TCAATATTAA AAGTATTATC TAATGCATTA TCTCATCCTA 
AAGTATTCAA GAGGTTCTGG GATTGTGGAG TTTTAAACCC TATTTATGGT CCTAATATTG 
CTAGTCAAGA CCAGATAAAA CTTGCCCTAT CTATATGTGA ATATTCACTA GATCTATTTA 
TGAGAGAATG GTTGAATGGT GTATCACTTG AAATATACAT TTGTGACAGC GATATGGAAG 
TTGCAAATGA TAGGAAACAA GCCTTTATTT CTAGACACCT TTCATTTGTT TGTTGTTTAG 
CAGAAATTGC ATCTTTCGGA CCTAACCTGT TAAACTTAAC ATACTTGGAG AGACTTGATC 
TATTGAAACA ATATCTTGAA TTAAATATTA AAGAAGACCC TACTCTTAAA TATGTACAAA 
TATCTGGATT ATTAATTAAA TCGTTCCCAT CAACTGTAAC ATACGTAAGA AAGACTGCAA 
TCAAATATCT AAGGATTCGC GGTATTAGTC CACCTGAGGT AATTGATGAT TGGGATCCGG 
TAGAAGATGA AAATATGCTG GATAACATTG TCAAAACTAT AAATGATAAC TGTAATAAAG 
ATAATAAAGG GAATAAAATT AACAATTTCT GGGGACTAGC ACTTAAGAAC TATCAAGTCC 
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TTAAAATCAG ATCTATAACA AGTGATTCTG ATGATAATGA TAGACTAGAT GCTAATACAA 
GTGGTTTGAC ACTTCCTCAA GGAGGGAATT ATCTATCGCA TCAATTGAGA TTATTCGGAA 
TCAACAGCAC TAGTTGTCTG AAAGCTCTTG AGTTATCACA AATTTTAATG AAGGAAGTCA 
ATAAAGACAA GGACAGGCTC TTCCT GGGAG AAGGAGCAGG AGCTATGCTA GCATGTTATG 
ATGCCACATT AGGACCTGCA GTTAATTATT ATAATTCAGG TTTGAATATA ACAGATGTAA 
TTGGTCAACG AGAATTGAAA ATATTTCCTT CAGAGGTATC ATTAGTAGGT AAAAAATTAG 
GAAATGTGAC ACAGATTCTT AACAGGGTAA AAGTACTGTT CAATGGGAAT CCTAATTCAA 
CATGGATAGG AAATATGGAA TGTGAGAGCT TAATATGGAG TGAATTAAAT GATAAGTCCA 
TTGGATTAGT ACATTGTGAT ATGGAAGGAG CTATCGGTAA ATCAGAAGAA ACTGTTCTAC 
ATGAACATTA TAGTGTTATA AGAATTACAT ACTTGATTGG GGATGATGAT GTTGTTTTAG 
TTTCCAAAAT TATACCTACA ATCACTCCGA ATTGGTCTAG AATACTTTAT CTATATAAAT 
TATATTGGAA AGATGTAAGT ATAATATCAC TCAAAACTTC TAATCCTGCA TCAACAGAAT 
TATATCTAAT TTCGAAAGAT GCATATTGTA CTATAATGGA ACCTAGTGAA ATTGTTTTAT 
CAAAACTTAA AAGATTGTCA CTCTTGGAAG AAAATAATCT ATTAAAATGG ATCATTTTAT 
CAAAGAAGAG GAATAATGAA TGGTTACATC ATGAAATCAA AGAAGGAGAA AGAGATTATG 
GAATCATGAG ACCATATCAT ATGGCACTAC AAATCTTTGG ATTTCAAATC AATTTAAATC 
ATCTGGCGAA AGAATTTTTA TCAACCCCAG ATCTGACTAA TATCAACAAT ATAATCCAAA 
GTTTTCAGCG AACAATAAAG GATGTTTTAT TTGAATGGAT TAATATAACT CATGATGATA 
AGAGACATAA ATTAGGCGGA AGATATAACA TATTCCCACT GAAAAATAAG GGAAAGTTAA 
GACTGCTATC GAGAAGACTA GTATTAAGTT GGATTTCATT ATCATTATCG ACTCGATTAC 
TTACAGGTCG CTTTCCTGAT GAAAAATTTG AACATAGAGC ACAGACTGGA TATGTATCAT 
TAGCTGATAC TGATTTAGAA TCATTAAAGT TATTGTCGAA AAACATCATT AAGAATTACA 
GAGAGTGTAT AGGATCAATA TCATATTGGT TTCTAACCAA AGAAGTTAAA ATACTTATGA 
AATTGATTGG TGGTGCTAAA TTATTAGGAA TTCCCAGACA ATATAAAGAA CCCGAAGACC 
AGTTATTAGA AAACTACAAT CAACATGATG AATTTGATAT CGATTAAAAC ATAAATACAA 
TGAAGATATA TCCTAACCTT TATCTTTAAG CCTAGGAATA GACAAAAAGT AAGAAAAACA 
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TGTAATATAT ATATACCAAA CAGAGTTCTT CTCTTGTTTG GT 15462 
(2) INFORMATION FOR SEQ ID NO: 20: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2233 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20: 

Met Asp Thr Glu Ser Asn Asn Gly Thr Val Ser Asp lie Leu Tyr Pro 
15 10 15 

Glu Cys His Leu Asn Ser Pro lie Val Lys Gly Lys lie Ala Gin Leu 
20 25 30 

His Thr lie Met Ser Leu Pro Gin Pro Tyr Asp Met Asp Asp Asp Ser 
35 40 45 

lie Leu Val lie Thr Arg Gin Lys lie Lys Leu Asn Lys Leu Asp Lys 
50 55 60 

Arg Gin Arg Ser lie Arg Arg Leu Lys Leu lie Leu Thr Glu Lys Val 
65 70 75 80 

Asn Asp Leu Gly Lys Tyr Thr Phe lie Arg Tyr Pro Glu Met Ser Lys 
85 90 95 

Glu Met Phe Lys Leu Tyr lie Pro Gly lie Asn Ser Lys Val Thr Glu 
100 105 110 

Leu Leu Leu Lys Ala Asp Arg Thr Tyr Ser Gin Met Thr Asp Gly Leu 
115 120 125 

Arg Asp Leu Trp lie Asn Val Leu Ser Lys Leu Ala Ser Lys Asn Asp 
130 135 140 

Gly Ser Asn Tyr Asp Leu Asn Glu Glu lie Asn Asn lie Ser Lys Val 
145 150 155 160 

His Thr Thr Tyr Lys Ser Asp Lys Trp Tyr Asn Pro Phe Lys Thr Trp 
165 170 175 

Phe Thr lie Lys Tyr Asp Met Arg Arg Leu Gin Lys Ala Arg Asn Glu 
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180 185 190 

lie Thr Phe Asn Val Gly Lys Asp Tyr Asn Leu Leu Glu Asp Gin Lys 
195 200 205 

Asn Phe Leu Leu He His Pro Glu Leu Val Leu He Leu Asp Lys Gin 
210 215 220 

Asn Tyr Asn Gly Tyr Leu He Thr Pro Glu Leu Val Leu Met Tyr Cys 
225 230 235 240 

Asp Val Val Glu Gly Arg Trp Asn He Ser Ala Cys Ala Lys Leu Asp 
245 250 255 

Pro Lys Leu Gin Ser Met Tyr Gin Lys Gly Asn Asn Leu Trp Glu Val 
260 265 270 

He Asp Lys Leu Phe Pro He Met Gly Glu Lys Thr Phe Asp Val He 
275 280 285 

Ser Leu Leu Glu Pro Leu Ala Leu Ser Leu He Gin Thr His Asp Pro 
290 295 300 

Val Lys Gin Leu Arg Gly Ala Phe Leu Asn His Val Leu Ser Glu Met 
305 310 315 320 

Glu Leu He Phe Glu Ser Arg Glu Ser He Lys Glu Phe Leu Ser Val 
325 330 335 

Asp Tyr He Asp Lys He Leu Asp He Phe Asn Lys Ser Thr He Asp 
340 345 350 

Glu He Ala Glu He Phe Ser Phe Phe Arg Thr Phe Gly His Pro Pro 
355 360 365 

Leu Glu Ala Ser He Ala Ala Glu Lys Val Arg Lys Tyr Met Tyr He 
370 375 380 

Gly Lys Gin Leu Lys Phe Asp Thr He Asn Lys Cys His Ala He Phe 
385 390 395 400 

Cys Thr He He He Asn Gly Tyr Arg Glu Arg His Gly Gly Gin Trp 
405 410 415 

Pro Pro Val Thr Leu Pro Asp His Ala His Glu Phe He He Asn Ala 
420 425 430 

Tyr Gly Ser Asn Ser Ala He Ser Tyr Glu Asn Ala Val Asp Tyr Tyr 
435 440 445 

Gin Ser Phe He Gly He Lys Phe Asn Lys Phe He Glu Pro Gin Leu 
450 455 460 
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Asp Glu Asp Leu Thr lie Tyr Met Lys Asp Lys Ala Leu Ser Pro Lys 
465 470 475 480 

Lys Ser Asn Trp Asp Thr Val Tyr Pro Ala Ser Asn Leu Leu Tyr Arg 
485 490 495 

Thr Asn Ala Ser Asn Glu Ser Arg Arg Leu Val Glu Val Phe lie Ala 
500 505 510 

Asp Ser Lys Phe Asp Pro His Gin lie Leu Asp Tyr Val Glu Ser Gly 
515 520 525 

Asp Trp Leu Asp Asp Pro Glu Phe Asn He Ser Tyr Ser Leu Lys Glu 
530 535 540 

Lys Glu He Lys Gin Glu Gly Arg Leu Phe Ala Lys Met Thr Tyr Lys 
545 550 * 555 560 

Met Arg Ala Thr Gin Val Leu Ser Glu Thr Leu Leu Ala Asn Asn He 
565 570 575 

Gly Lys Phe Phe Gin Glu Asn Gly Met Val Lys Gly Glu He Glu Leu 
580 585 590 

Leu Lys Arg Leu Thr Thr He Ser He Ser Gly Val Pro Arg Tyr Asn 
595 600 605 

Glu Val Tyr Asn Asn Ser Lys Ser His Thr Asp Asp Leu Lys Thr Tyr 
610 615 620 

Asn Lys He Ser Asn Leu Asn Leu Ser Ser Asn Gin Lys Ser Lys Lys 
625 630 635 640 

Phe Glu Phe Lys Ser Thr Asp He Tyr Asn Asp Gly Tyr Glu Thr Val 
645 650 655 

Ser Cys Phe Leu Thr Thr Asp Leu Lys Lys Tyr Cys Leu Asn Trp Arg 
660 665 670 

Tyr Glu Ser Thr Ala Leu Phe Gly Glu Thr Cys Asn Gin He Phe Gly 
675 680 685 

Leu Asn Lys Leu Phe Asn Trp Leu His Pro Arg Leu Glu Gly Ser Thr 
690 695 700 

He Tyr Val Gly Asp Pro Tyr Cys Pro Pro Ser Asp Lys Glu His He 
705 710 715 720 

Ser Leu Glu Asp His Pro Asp Ser Gly Phe Tyr Val His Asn Pro Arg 
725 730 735 
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Gly Gly lie Glu Gly Phe Cys Gin Lys Leu Trp Thr Leu lie Ser lie 
740 745 750 

Ser Ala lie His Leu Ala Ala Val Arg lie Gly Val Arg Val Thr Ala 
755 760 765 

Met Val Gin Gly Asp Asn Gin Ala He Ala Val Thr Thr Arg Val Pro 
770 775 780 

Asn Asn Tyr Asp Tyr Arg Val Lys Lys Glu He Val Tyr Lys Asp Val 
785 790 795 800 

Val Arg Phe Phe Asp Ser Leu Arg Glu Val Met Asp Asp Leu Gly His 
805 810 815 

Glu Leu Lys Leu Asn Glu Thr He He Ser Ser Lys Met Phe He Tyr 
820 825 830 

Ser Lys Arg He Tyr Tyr Asp Gly Arg He Leu Pro Gin Ala Leu Lys 
835 840 845 

Ala Leu Ser Arg Cys Val Phe Trp Ser Glu Thr Val He Asp Glu Thr 
850 855 860 

Arg Ser Ala Ser Ser Asn Leu Ala Thr Ser Phe Ala Lys Ala He Glu 
865 870 875 880 

Asn Gly Tyr Ser Pro Val Leu Gly Tyr Ala Cys Ser He Phe Lys Asn 
885 890 895 

He Gin Gin Leu Tyr He Ala Leu Gly Met Asn He Asn Pro Thr He 
900 905 910 

Thr Gin Asn He Arg Asp Gin Tyr Phe Arg Asn Pro Asn Trp Met Gin 
915 920 925 

Tyr Ala Ser Leu He Pro Ala Ser Val Gly Gly Phe Asn His Met Ala 
930 935 940 

Met Ser Arg Cys Phe Val Arg Asn He Gly Asp Pro Ser Val Ala Ala 
945 950 955 960 

Leu Ala Asp He Lys Arg Phe He Lys Ala Asn Leu Leu Asp Arg Ser 
965 970 975 

Val Leu Tyr Arg He Met Asn Gin Glu Pro Gly Glu Ser Ser Phe Phe 
980 985 990 

Asp Trp Ala Ser Asp Pro Tyr Ser Cys Asn Leu Pro Gin Ser Gin Asn 
995 1000 1005 

He Thr Thr Met He Lys Asn He Thr Ala Arg Asn Val Leu Gin Asp 
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1010 1015 1020 

Ser Pro Asn Pro Leu Leu Ser Gly Leu Phe Thr Asn Thr Met lie Glu 
1025 1030 1035 1040 

Glu Asp Glu Glu Leu Ala Glu Phe Leu Met Asp Arg Lye Val lie Leu 
1045 1050 1055 

Pro Arg Val Ala His Asp lie Leu Asp Asn Ser Leu Thr Gly lie Arg 
1060 1065 1070 

Asn Ala He Ala Gly Met Leu Asp Thr Thr Lys Ser Leu He Arg Val 
1075 1080 1085 

Gly He Asn Arg Gly Gly Leu Thr Tyr Ser Leu Leu Arg Lys He Ser 
1090 1095 1100 

Asn Tyr Asp Leu Val Gin Tyr Glu Thr Leu Ser Arg Thr Leu Arg Leu 
1105 1110 1115 1120 

He Val Ser Asp Lys He Lys Tyr Glu Asp Met Cys Ser Val Asp Leu 
1125 1130 1135 

Ala He Ala Leu Arg Gin Lys Met Trp He His Leu Ser Gly Gly Arg 
1140 1145 1150 

Met He Ser Gly Leu Glu Thr Pro Asp Pro Leu Glu Leu Leu Ser Gly 
1155 1160 1165 

Val Val He Thr Gly Ser Glu His Cys Lys He Cys Tyr Ser Ser Asp 
1170 1175 1180 

Gly Thr Asn Pro Tyr Thr Trp Met Tyr Leu Pro Gly Asn He Lys He 
1185 1190 1195 1200 

Gly Ser Ala Glu Thr Gly He Ser Ser Leu Arg Val Pro Tyr Phe Gly 
1205 1210 1215 

Ser Val Thr Asp Glu Arg Ser Glu Ala Gin Leu Gly Tyr He Lys Asn 
1220 1225 1230 

Leu Ser Lys Pro Ala Lys Ala Ala He Arg He Ala Met He Tyr Thr 
1235 1240 1245 

Trp Ala Phe Gly Asn Asp Glu He Ser Trp Met Glu Ala Ser Gin He 
1250 1255 1260 

Ala Gin Thr Arg Ala Asn Phe Thr Leu Asp Ser Leu Lys He Leu Thr 
1265 1270 1275 1280 

Pro Val Ala Thr Ser Thr Asn Leu Ser His Arg Leu Lys Asp Thr Ala 
1285 1290 1295 
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Thr Gin Met Lys Phe Ser Ser Thr Ser Leu lie Arg Val Ser Arg Phe 
1300 1305 1310 

lie Thr Met Ser Asn Asp Aen Met Ser lie Lys Glu Ala Asn Glu Thr 
1315 1320 1325 

Lys Asp Thr Asn Leu lie Tyr Gin Gin lie Met Leu Thr Gly Leu Ser 
1330 1335 1340 

Val Phe Glu Tyr Leu Phe Arg Leu Lys Glu Thr Thr Gly His Asn Pro 
1345 1350 1355 1360 

lie Val Met His Leu His lie Glu Asp Glu Cys Cys lie Lys Glu Ser 
1365 1370 1375 

Phe Asn Asp Glu His lie Asn Pro Glu Ser Thr Leu Glu Leu lie Arg 
1380 1385 — 1390 

Tyr Pro Glu Ser Asn Glu Phe lie Tyr Asp Lys Asp Pro Leu Lys Asp 
1395 1400 1405 

Val Asp Leu Ser Lys Leu Met Val He Lys Asp His Ser Tyr Thr He 
1410 1415 1420 

Asp Met Asn Tyr Trp Asp Asp Thr Asp He He His Ala He Ser lie 
1425 1430 1435 1440 

Cys Thr Ala He Thr He Ala Asp Thr Met Ser Gin Leu Asp Arg Asp 
1445 1450 1455 

Asn Leu Lys Glu He He Val He Ala Asn Asp Asp Asp He Asn Ser 
1460 1465 1470 

Leu He Thr Glu Phe Leu Thr Leu Asp He Leu Val Phe Leu Lys Thr 
1475 1480 1485 



Phe Gly Gly Leu Leu Val Asn Gin 
1490 1495 

Lys He Glu Gly Arg Asp Leu He 
1505 1510 

Arg Asp Thr Ser His Ser He Leu 
1525 

His Pro Lys Val Phe Lys Arg Phe 
1540 



Phe Ala Tyr Thr Leu Tyr Ser Leu 
1500 

Trp Asp Tyr He Met Arg Thr Leu 
1515 1520 

Lys Val Leu Ser Asn Ala Leu Ser 
1530 1535 

Trp Asp Cys Gly Val Leu Asn Pro 
1545 1550 



He Tyr Gly Pro Asn He Ala Ser Gin Asp Gin He Lys Leu Ala Leu 
1555 1560 1565 



SUBSTITUTE SHEET (RULE 26) 



WO 98/13501 PCT/US97/16718 



- 271 - 



Ser lie Cys Glu Tyr Ser Leu Asp Leu Phe Met Arg Glu Trp Leu Asn 
1570 1575 1580 

Gly Val Ser Leu Glu lie Tyr lie Cys Asp Ser Asp Met Glu Val Ala 
1585 1590 1595 1600 

Asn Asp Arg Lys Gin Ala Phe He Ser Arg His Leu Ser Phe Val Cys 
1605 1610 1615 

Cys Leu Ala Glu He Ala Ser Phe Gly Pro Asn Leu Leu Asn Leu Thr 
1620 1625 1630 

Tyr Leu Glu Arg Leu Asp Leu Leu Lys Gin Tyr Leu Glu Leu Asn He 
1635 1640 1645 

Lys Glu Asp Pro Thr Leu Lys Tyr Val Gin He Ser Gly Leu Leu He 
1650 1655 1660 

Lys Ser Phe Pro Ser Thr Val Thr Tyr Val Arg Lys Thr Ala He Lys 
1665 1670 1675 1680 

Tyr Leu Arg He Arg Gly He Ser Pro Pro Glu Val He Asp Asp Trp 
1685 1690 1695 

Asp Pro Val Glu Asp Glu Asn Met Leu Asp Asn He Val Lys Thr He 
1700 1705 1710 

Asn Asp Asn Cys Asn Lys Asp Asn Lys Gly Asn Lys He Asn Asn Phe 
1715 1720 1725 

Trp Gly Leu Ala Leu Lys Asn Tyr Gin Val Leu Lys He Arg Ser He 
1730 1735 1740 

Thr Ser Asp Ser Asp Asp Asn Asp Arg Leu Asp Ala Asn Thr Ser Gly 
1745 1750 1755 1760 

Leu Thr Leu Pro Gin Gly Gly Asn Tyr Leu Ser His Gin Leu Arg Leu 
1765 1770 1775 

Phe Gly He Asn Ser Thr Ser Cys Leu Lys Ala Leu Glu Leu Ser Gin 
1780 1785 1790 

He Leu Met Lys Glu Val Asn Lys Asp Lys Asp Arg Leu Phe Leu Gly 
1795 1800 1805 

Glu Gly Ala Gly Ala Met Leu Ala Cys Tyr Asp Ala Thr Leu Gly Pro 
1810 1815 1820 

Ala Val Asn Tyr Tyr Asn Ser Gly Leu Asn 116 Thr Asp Val He Gly 
1825 1830 1835 1840 

Gin Arg Glu Leu Lys He Phe Pro Ser Glu Val Ser Leu Val Gly Lys 
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1845 1850 1855 

Lys Lou Gly Asn Val Thr Gin lie Leu Asn Arg Val Lys Val Leu Phe 
I860 1865 1870 

Asn Gly Asn Pro Asn Ser Thr Trp lie Gly Asn Met Glu Cys Glu Ser 
1875 1880 1885 



Leu lie Trp Ser Glu Leu Asn Asp 
1890 1895 

Asp Met Glu Gly Ala He Gly Lys 
1905 1910 

His Tyr Ser Val He Arg He Thr 
1925 

Val Leu Val Ser Lys He He Pro 
1940 



Lys Ser He Gly Leu Val His Cys 
1900 

Ser Glu Glu Thr Val Leu His Glu 
1915 1920 

Tyr Leu He Gly Asp Asp Asp Val 
1930 1935 

Thr He Thr Pro Asn Trp Ser Arg 
1945 1950 



He Leu Tyr Leu Tyr Lys Leu Tyr Trp Lys Asp Val Ser He He Ser 
1955 1960 1965 

Leu Lys Thr Ser Asn Pro Ala Ser Thr Glu Leu Tyr Leu He Ser Lys 
1970 1975 1980 

Asp Ala Tyr Cys Thr He Met Glu Pro Ser Glu He Val Leu Ser Lys 
1985 1990 1995 2000 

Leu Lys Arg Leu Ser Leu Leu Glu Glu Asn Asn Leu Leu Lys Trp He 
2005 2010 2015 

He Leu Ser Lys Lys Arg Asn Asn Glu Trp Leu His His Glu He Lys 
2020 2025 2030 

Glu Gly Glu Arg Asp Tyr Gly He Met Arg Pro Tyr His Met Ala Leu 
2035 2040 2045 

Gin He Phe Gly Phe Gin He Asn Leu Asn His Leu Ala Lys Glu Phe 
2050 2055 2060 

Leu Ser Thr Pro Asp Leu Thr Asn He Asn Asn He He Gin Ser Phe 
2065 2070 2075 2080 

Gin Arg Thr He Lys Asp Val Leu Phe Glu Trp He Asn He Thr His 
2085 2090 2095 

Asp Asp Lys Arg His Lys Leu Gly Gly Arg Tyr Asn He Phe Pro Leu 
2100 2105 2110 

Lys Asn Lys Gly Lys Leu Arg Leu Leu Ser Arg Arg Leu Val Leu Ser 
2115 2120 2125 
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Trp Ho Ser Leu Ser Leu Ser Thr Arg Leu Leu Thr Gly Arg Phe Pro 
2130 2135 2140 

Asp Glu Lys Phe Glu His Arg Ala Gin Thr Gly Tyr Val Ser Leu Ala 
2145 2150 2155 2160 

Asp Thr Asp Leu Glu Ser Leu Lys Leu Leu Ser Lys Asn He He Lys 
2165 2170 2175 

Asn Tyr Arg Glu Cys He Gly Ser He Ser Tyr Trp Phe Leu Thr Lys 
2180 2185 2190 

Glu Val Lys He Leu Met Lys Leu He Gly Gly Ala Lys Leu Leu Gly 
2195 2200 2205 

He Pro Arg Gin Tyr Lys Glu Pro Glu Asp Gin Leu Leu Glu Asn Tyr 
2210 2215 2220 

Asn Gin His Asp Glu Phe Asp He Asp 
2225 2230 

(2) INFORMATION FOR SEQ ID NO: 21: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15462 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21: 

ACCAAACAAG AGAAGAAACT TGCTTGGTAA TATAAATTTA ACTTAAAATT AACTTAGGAT 60 

TTAAGACATT GACTAGAAGG TCAAGAAAAG GGAACTCTAT AATTTCAAAA ATGTTGAGCC 120 

TATTTGATAC ATTTAATGCA CGTAGGCAAG AAAACATAAC AAAATCAGCC GGTGGAGCTA 180 

TCATTCCTGG ACAGAAAAAT ACTGTCTCTA TATTCGCCCT TGGACCGACA ATAACTGATG 240 

ATAATGAGAA AATGACATTA GCTCTTCTAT TTCTATCTCA TTCACTAGAT AATGAGAAAC 300 

AACATGCACA AAGGGCAGGG TTCTTGGTGT CTTTATTGTC AATGGCTTAT GCCAATCCAG 360 

AGCTCTACCT AACAACAAAT GGAAGTAATG CAGATGCCAA GTATGTCATA TACATGATTG 420 

AGAAAGATCT AAAACGGCAA AAGTATGGAG GATTTGTGGT TAAGACGAGA GAGATGATAT 480 
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ATGAAAAGAC AACTGATTGG ATATTTGGAA GTGACCTGGA TTATGATCAG GAAACTATGT 
TGCAGAACGG CAGGAACAAT TCAACAATTG AAGACCTTGT CCACACATTT GGGTATCCAT 
CATGTTTAGG AGCTCTTATA ATACAGATCT GGATAGTTCT GGTCAAAGCT ATCACTAGTA 
TCTCAGGGTT AAGAAAAGGC TTTTTCACCC GATTGGAAGC TTTCAGACAA GATGGAACAG 
TGCAGGCAGG GCTGGTATTG AGCGGTGACA CAGTGGATCA GATTGGGTCA ATCATGCGGT 
CTCAACAGAG CTTGGTAACT CTTATGGTTG AAACATTAAT AACAATGAAT ACCAGCAGAA 
ATGACCTCAC AACCATAGAA AAGAATATAC AAATTGTTGG CAACTACATA AGAGATGCAG 
GTCTCGCTTC ATTCTTCAAT ACAATCAGAT ATGGAATTGA GACCAGAATG GCAGCTTTGA 
CTCTATCCAC TCTCAGACCA GATATCAATA GATTAAAAGC TTTGATGGAA CTGTATTTAT 
CAAAGGGACC ACGCGCTCCT TTCATCTGTA TCCTCAGAGA TCCTATACAT GGTGAGTTCG 
CACCAGGCAA CTATCCTGCC ATATGGAGCT ATGCAATGGG GGTGGCAGTT GTACAAAATA 
GAGCCATGCA ACAGTATGTG ACGGGAAGAT CATATCTAGA CATTGATATG TTCCAGCTAG 
GACAAGCAGT AGCACGTGAT GCCGAAGCTC AAATGAGCTC AACACTGGAA GATGAACTTG 
GAGTGACACA CGAAGCTAAA GAAAGCTTGA AGAGACATAT AAGGAACATA AACAGTTCAG 
AGACATCTTT CCACAAACCG ACAGGTGGAT CAGCCATAGA GATGGCAATA GATGAAGAGC 
CAGAACAATT CGAACATAGA GCAGATCAAG AACAAAATGG AGAACCTCAA TCATCCATAA 
TTCAATATGC CTGGGCAGAA GGAAATAGAA GCGATGATCA GACTGAGCAA GCTACAGAAT 
CTGACAATAT CAAGACCGAA CAACAAAACA TCAGAGACAG ACTAAACAAG AGACTCAACG 
ACAAGAAGAA ACAAAGCAGT CAACCACCCA CTAATCCCAC AAACAGAACA AACCAGGACG 
AAATAGATGA TCTGTTTAAC GCATTTGGAA GCAACTAATC GAATCAACAT TTTAATCTAA 
ATCAATAATA AATAAGAAAA ACTTAGGATT AAAGAATCCT ATCATACCGG AATATAGGGT 
GGTAAATTTA GAGTCTGCTT GAAACTCAAT CAATAGAGAG TTGATGGAAA GCGATGCTAA 
AAACTATCAA ATCATGGATT CTTGGGAAGA GGAATCAAGA GATAAATCAA CTAATATCTC 
CTCGGCCCTC AACATCATTG AATTCATACT CAGCACCGAC CCCCAAGAAG ACTTATCGGA 
AAACGACACA ATCAACACAA GAACCCAGCA ACTCAGTGCC ACCATCTGTC AACCAGAAAT 
CAAACCAACA GAAACAAGTG AGAAAGATAG TGGATCAACT GACAAAAATA GACAGTCTGG 
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GTCATCACAC GAATGTACAA CAGAAGCAAA AGATAGAAAC ATTGATCAGG AAACTGTACA 
GAGAGGACCT GGGAGAAGAA GCAGCTCAGA TAGTAGAGCT GAGACTGTGG TCTCTGGAGG 
AATCCCCAGA AG C AT CACAG ATTCTAAAAA TGGAACCCAA AACACGGAGG ATATTGATCT 
CAATGAAATT AGAAAGATGG ATAAGGACTC TATTGAGGGG AAAATGCGAC AATCTGCAAA 
TGTTCCAAGC GAGATATCAG GAAGTGATGA CATATTTACA ACAGAACAAA GTAGAAACAG 
TGATCATGGA AGAAGCCTGG AATCTATCAG TACACCTGAT ACAAGATCAA TAAGTGTTGT 
TACTGCTGCA ACACCAGATG ATGAAGAAGA AATACTAATG AAAAATAGTA GGACAAAGAA 
AAGTTCTTCA ACACATCAAG AAGATGACAA AAGAATTAAA AAAGGGGGAA AAGGGAAAGA 
CTGGTTTAAG AAATCAAAAG ATACCGACAA CCAGATACCA ACATCAGACT ACAGATCCAC 
ATCAAAAGGG CAGAAGAAAA TCTCAAAGAC AACAACCACC AACACCGACA CAAAGGGGCA 
AACAGAAATA CAGACAGAAT CATCAGAAAC ACAATCCTCA TCATGGAATC TCATCATCGA 
CAACAACACC GACCGGAACG AACAGACAAG CACAACTCCT CCAACAACAA CTTCCAGATC 
AACTTATACA AAAGAATCGA TCCGAACAAA CTCTGAATCC AAACCCAAGA CACAAAAGAC 
AAATGGAAAG GAAAGGAAGG ATACAGAAGA GAGCAATCGA TTTACAGAGA GGGCAATTAC 
TCTATTGCAG AATCTTGGTG TAATTCAATC CACATCAAAA CTAGATTTAT ATCAAGACAA 
ACGAGTTGTA TGTGTAGCAA ATGTACTAAA CAATGTAGAT ACTGCATCAA AGATAGATTT 
CCTGGCAGGA TTAGTCATAG GGGTTTCAAT GGACAACGAC ACAAAATTAA CACAGATACA 
AAATGAAATG CTAAACCTCA AAGCAGATCT AAAGAAAATG GACGAATCAC ATAGAAGATT 
GATAGAAAAT CAAAGAGAAC AACTGTCATT GATCACGTCA CTAATTTCAA ATCTCAAAAT 
TATGACTGAG AGAGGAGGAA AGAAAGACCA AAATGAATCC AATGAGAGAG TATCCATGAT 
CAAAACAAAA TTGAAAGAAG AAAAGATCAA GAAGACCAGG TTTGACCCAC TTATGGAGGC 
ACAAGGCATT GACAAGAATA TACCCGATCT ATATCGACAT GCAGGAGATA CACTAGAGAA 
CGATGTACAA GTTAAATCAG AGATATTAAG TTCATACAAT GAGTCAAATG CAACAAGACT 
AATACCCAAA AAAGTGAGCA GTACAATGAG ATCACTAGTT GCAGTCATCA ACAACAGCAA 
TCTCTCACAA AGCACAAAAC AATCATACAT AAACGAACTC AAACGTTGCA AAAATGATGA 
AGAAGTATCT GAATTAATGG ACATGTTCAA TGAAGATGTC AACAATTGCC AATGATCCAA 
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CAAAGAAACG ACACCGAACA AACAGACAA6 AAACAACAGT AGATCAAAAC CTGTCAACAC 
ACACAAAATC AAGCAGAATG AAACAACAGA TATCAATCAA TATACAAATA AGAAAAACTT 
AGGATTAAAG AATAAATTAA TCCTTGTCCA AAATGAGTAT AACTAACTCT GCAATATACA 
CATTCCCAGA ATCATCATTC TCTGAAAATG GTCATATAGA ACCATTACCA CTCAAAGTCA 
ATGAACAGAG GAAAGCAGTA CCCCACATTA GAGTTGCCAA GATCGGAAAT CCACCAAAAC 
ACGGATCCCG GTATTTAGAT GTCTTCTTAC TCGGCTTCTT CGAGATGGAA CGAATCAAAG 
ACAAATACGG GAGTGTGAAT GATCTCGACA GTGACCCGAG TTACAAAGTT TGTGGCTCTG 
GATCATTACC AATCGGATTG GCTAAGTACA CTGGGAATGA CCAGGAATTG TTACAAGCCG 
CAACCAAACT GGATATAGAA GTGAGAAGAA CAGTCAAAGC GAAAGAGATG GTTGTTTACA 
CGGTACAAAA TATAAAACCA GAACTGTACC CATGGTCCAA TAGACTAAGA AAAGGAATGC 
TGTTCGATGC CAACAAAGTT GCTCTTGCTC CTCAATGTCT TCCACTAGAT AGGAGCATAA 
AATTTAGAGT AATCTTCGTG AATTGTACGG CAATTGGATC AATAACCTTG TTCAAAATTC 
CTAAGTCAAT GGCATCACTA TCTCTAACCA ACACAATATC AATCAATCTG CAGGTACACA 
TAAAAACAGG GGTTCAGACT GATTCTAAAG GGATAGTTCA AATTTTGGAT GAGAAAGGCG 
AAAAATCACT GAATTTCATG GTCCATCTCG GATTGATCAA AAGAAAAGTA GGCAGAATGT 
ACTCTGTTGA ATACTGTAAA CAGAAAATCG AGAAAATGAG ATTGATATTT TCTTTAGGAC 
TAGTTGGAGG AATCAGTCTT CATGTCAATG CAACTGGGTC CATATCAAAA ACACTAGCAA 
GTCAGCTGGT ATTCAAAAGA GAGATTTGTT ATCCTTTAAT GGATCTAAAT CCGCATCTCA 
ATCTAGTTAT CTGGGCTTCA TCAGTAGAGA TTACAAGAGT GGATGCAATT TTCCAACCTT 
CTTTACCTGG CGAGTTCAGA TACTATCCTA ATATTATTGC AAAAGGAGTT GGGAAAATCA 
AACAATGGAA CTAGTAATCT CTATTTTAGT CCGGACGTAT CTATTAAGCC GAAGCAAATA 
AAGGATAATC AAAAACTTAG GACAAAAGAG GTCAATACCA ACAACTATTA GCAGTCACAC 
TCGCAAGAAT AAGAGAGAAG GGACCAAAAA AGTCAAATAG GAGAAATCAA AACAAAAGGT 
ACAGAACACC AGAACAACAA AATCAAAACA TCCAACTCAC TCAAAACAAA AATTCCAAAA 
GAGACCGGCA ACACAACAAG CACTGAACAC AATGCCAACT TCAATACTGC TAATTATTAC 
AACCATGATC ATGGCATCTT TCTGCCAAAT AGATATCACA AAACTACAGC ACGTAGGTGT 
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ATTGGTCAAC AGTCCCAAAG GGATGAAGAT ATCACAAAAC TTTGAAACAA GATATCTAAT 
TTTGAGCCTC ATACCAAAAA T AG AAGACT C TAACTCTTGT GGTGACCAAC AGATCAAGCA 
ATACAAGAAG TTATTGGATA GACTGATCAT CCCTTTATAT GATGGATTAA GATTACAGAA 
AGATGTGATA GTAACCAATC AAGAATC CAA TGAAAACACT GATCCCAGAA CAAAACGATT 
CTTTGGAGGG GTAATTGGAA CCATTGCTCT GGGAGTAGCA ACCTCAGCAC AAATTACAGC 
GGCAGTTGCT CTGGTTGAAG CCAAGCAGGC AAGATCAGAC ATCGAAAAAC TCAAAGAAGC 
AATTAGGGAC ACAAATAAAG CAGTGCAGTC AGTTCAGAGC TCCATAGGAA ATTTAATAGT 
AGCAATTAAA TCAGTCCAGG ATTATGTTAA CAAAGAAATC GTGCCATCGA TTGCGAGGCT 
AGGTTGTGAA GCAGCAGGAC TTCAATTAGG AATTGCATTA ACACAGCATT ACTCAGAATT 
AACAAACATA TTTGGTGATA ACATAGGATC GTTACAAGAA AAAGGAATAA AATTACAAGG 
TATAGCATCA TTATACCGCA CAAATATCAC AGAAATATTC ACAACATCAA CAGTTGATAA 
ATATGATATC TATGATCTGT TATTTACAGA AT CAATAAAG GTGAGAGTTA TAGATGTTGA 
CTTGAATGAT TACTCAATCA CCCTCCAAGT CAGACTCCCT TTATTAACTA GGCTGCTGAA 
CACTCAGATC TACAAAGTAG ATTCCATATC ATATAACATC CAAAACAGAG AATGGTATAT 
CCCTCTTCCC AGCCATATCA TGACGAAAGG GGCATTTCTA GGTGGAGCAG ACGTCAAAGA 
ATGTATAGAA GCATTCAGCA GCTATATATG CCCTTCTGAT CCAGGATTTG TATTAAACCA 
TGAAATAGAG AGCTGCTTAT CAGGAAACAT ATCCCAATGT CCAAGAACAA CGGTCACATC 
AGACATTGTT CCAAGATATG CATTTGTCAA TGGAGGAGTG GTTGCAAACT GTATAACAAC 
CACCTGTACA TGCAACGGAA TTGGTAATAG AATCAATCAA CCACCTGATC AAGGAGTAAA 
AATTATAACA CATAAAGAAT GTAGTACAGT AGGTATCAAC GGAATGCTGT TCAATACAAA 
TAAAGAAGGA ACTCTTGCAT TCTATACACC AAATGATATA ACACTAAACA ATTCTGTTAC 
ACTTGATCCA ATTGACATAT CAATCGAGCT CAACAAGGCC AAATCAGATC TAGAAGAATC 
AAAAGAATGG ATAAGAAGGT CAAATCAAAA ACTAGATTCT ATTGGAAATT GGCATCAATC 
TAGCACTACA ATCATAATTA TTTTGATAAT GATCATTATA TTGTTTATAA TTAATATAAC 
GATAATTACA ATTGCAATTA AGTATTACAG AATTCAAAAG AGAAATCGAG TGGATCAAAA 
TGACAAGCCA TATGTACTAA CAAACAAATA ACATATCTAC AGATCATTAG ATATTAAAAT 
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TATAAAAAAC TTAGGAGTAA AGTTACGCAA TCCAACTCTA CTCATATAAT TGAGGAAGGA 
CCCAATAGAC AAATCCAAAT TCGAGATGGA ATACTGGAAG CATACCAATC ACGGAAAGGA 
TGCTGGCAAT GAGCTGGAGA CGTCTATGGC TACTCATGGC AACAAGCTCA CTAATAAGAT 
AATATACATA TTATGGACAA TAATCCTGGT GTTATTATCA ATAGTCTTCA TCATAGTGCT 
AATTAATTCC ATCAAAAGTG AAAAGGCCCA CGAATCATTG CTGCAAGACA TAAATAATGA 
GTTTATGGAA ATTACAGAAA AGATCCAAAT GGCATCGGAT AATACCAATG ATCTAATACA 
GTCAGGAGTG AATACAAGGC TTCTTACAAT TCAGAGTCAT GTCCAGAATT ACATACCAAT 
ATCATTGACA CAACAGATGT CAGATCTTAG GAAATTCATT AGTGAAATTA CAATTAGAAA 
TGATAATCAA GAAGTGCTGC CACAAAGAAT AACACATGAT GTAGGTATAA AACCTTTAAA 
TCCAGATGAT TTTTGGAGAT GCACGTCTGG TCTTCCATCT TTAATGAAAA CTCCAAAAAT 
AAGGTTAATG CCAGGGCCGG GATTATTAGC TATGCCAACG ACTGTTGATG GCTGTGTTAG 
AACTCCGTCT TTAGTTATAA ATGATCTGAT TTATGCTTAT ACCTCAAATC TAATTACTCG 
AGGTTGTCAG GATATAGGAA AATCATATCA AGTCTTACAG ATAGGGATAA TAACTGTAAA 
CTCAGACTTG GTACCTGACT TAAATC CT AG GATCTCTCAT ACCTTTAACA TAAATGACAA 
TAGGAAGTCA TGTTCTCTAG CACTCCTAAA TACAGATGTA TATCAACTGT GTTCAACTCC 
CAAAGTTGAT GAAAGATCAG ATTATGCATC ATCAGGCATA GAAGATATTG TACTTGATAT 
TGTCAATTAT GATGGTTCAA TCTCAACAAC AAGATTTAAG AATAATAACA TAAGCTTTGA 
TCAACCATAT GCTGCACTAT ACCCATCTGT TGGACCAGGG ATATACTACA AAGGCAAAAT 
AATATTTCTC GGGTATGGAG GTCTTGAACA TCCAATAAAT GAGAATGTAA TCTGCAACAC 
AACTGGGTGC CCCGGGAAAA CACAGAGAGA CTGTAATCAA GCGTCTCATA GTCCATGGTT 
TTCAGATAGG AGGATGGTCA ACTCCATCAT TGTTGCTGAC AAAGGCTTAA ACTCAATTCC 
AAAATTGAAA GTATGGACGA TATCTATGCG ACAAAATTAC TGGGGGTCAG AAGGAAGGTT 
ACTTCTACTA GGTAACAAGA TCTATATATA TACAAGATCT ACAAGTTGGC ATAGCAAGTT 
ACAATTAGGA ATAATTGATA TTACTGATTA CAGTGATATA AGGATAAAAT GGACATGGCA 
TAATGTGCTA TCAAGACCAG GAAACAATGA ATGTCCATGG GGACATTCAT GTCCAGATGG 
ATGTATAACA GGAGTATATA CTGATGCATA TCCACTCAAT CCCACAGGGA GCATTGTGTC 



6780 
6840 
6900 
6960 
7020 
7080 
7140 
7200 
7260 
7320 
7380 
7440 
7500 
7560 
7620 
7680 
7740 
7800 
7860 
7920 
7980 
8040 
8100 
8160 
8220 
8280 



SUBSTITUTE SHEET (RULE 26) 




WO 98/13501 



PCT/US97/16718 



- 279 - 



ATCTGTCATA TTAGACTCAC AAAAATCGAG AGTGAACCCA GTCATAACTT ACTCAACAGC 
AACCGAAAGA GTAAACGAGC TGGCCATCCT AAACAGAACA CTCTCAGCTG GATATACAAC 
AACAAGCTGC ATTACACACT ATAACAAAGG ATATTGTTTT CATATAGTAG AAATAAATCA 
TAAAAGCTTA AACACATTTC AACCCATGTT GTTCAAAACA GAGATTCCAA AAAGCTGCAG 
TTAATCATAA TTAACCATAA TATGCATCAA TCTATCTATA ATACAAGTAT ATGATAAGTA 
ATCAGCAATC AGACAATAGA CAAAAGGGAA ATATAAAAAA CTTAGGAGCA AAGCGTGCTC 
GGGAAATGGA CACTGAATCT AACAATGGCA CTGTATCTGA CATACTCTAT CCTGAGTGTC 
ACCTTAACTC TCCTATCGTT AAAGGTAAAA TAGCACAATT ACACACTATT ATGAGTCTAC 
CTCAGCCTTA TGATATGGAT GACGACTCAA TACTAGTTAT CACTAGACAG AAAATAAAAC 
TTAATAAATT GGATAAAAGA CAACGATCTA TTAGAAGATT AAAATTAATA TTAACTGAAA 
AAGTGAATGA CTTAGGAAAA TACACATTTA TCAGATATCC AGAAATGTCA AAAGAAATGT 
TCAAATTATA TATACCTGGT ATTAACAGTA AAGTGACTGA ATTATTACTT AAAGCAGATA 
GAACATATAG TCAAATGACT GATGGATTAA GAGATCTATG GATTAATGTG CTATCAAAAT 
TAGCCTCAAA AAATGATGGA AGCAATTATG ATCTTAATGA AGAAATTAAT AATATATCGA 
AAGTTCACAC AACCTATAAA TCAGATAAAT GGTATAATCC ATTCAAAACA TGGTTTACTA 
TCAAGTATGA TATGAGAAGA TTACAAAAAG CTCGAAATGA GATCACTTTT AATGTTGGGA 
AGGATTATAA CTTGTTAGAA GACCAGAAGA ATTTCTTATT GATACATCCA GAATTGGTTT 
TGATATTAGA TAAACAAAAC TACAATGGTT ATCTAATTAC TCCTGAATTA GTATTGATGT 
ATTGTGACGT AGTCGAAGGC CGATGGAATA TAAGTGCATG TGCTAAGTTA GATCCAAAAT 
TACAATCTAT GTATCAGAAA GGTAATAACC TGTGGGAAGT GATAGATAAA TTGTTTCCAA 
TTATGGGAGA AAAGACATTT GATGTGATAT CGTTATTAGA ACCACTTGCA TTATCCTTAA 
TTCAAACTCA TGATCCTGTT AAACAACTAA GAGGAGCTTT TTTAAATCAT GTGTTATCCG 
AGATGGAATT AATATTTGAA TCTAGAGAAT CGATTAAGGA ATTTCTGAGT GTAGATTACA 
TTGATAAAAT TTTAGATATA TTTAATAAGT CTACAATAGA TGAAATAGCA GAGATTTTCT 
CTTTTTTTAG AACATTTGGG CATCCTCCAT TAGAAGCTAG TATTGCAGCA GAAAAGGTTA 
GAAAATATAT GTATATTGGA AAACAATTAA AATTTGACAC TATTAATAAA TGTCATGCTA 
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TCTTCTGTAC AATAATAATT AACGGATATA GAGAGAGGCA TGGTGGACAG TGGCCTCCTG 
TGACATTACC TGATCATGCA CACGAATTCA TCATAAATGC TTACGGTTCA AACTCTGCGA 
TATCATATGA GAATGCTGTT GATTATTACC AGAGCTTTAT AGGAATAAAA TTCAATAAAT 
TCATAGAGCC TCAGTTAGAT GAGGATTTGA CAATTTATAT GAAAGATAAA GCATTATCTC 
CAAAAAAATC AAATTGGGAC ACAGTTTATC CTGCATCTAA TTTACTGTAC CGTACTAACG 
CATCCAACGA ATCACGAAGA TTAGTTGAAG TATTTATAGC AGATAGTAAA TTTGATCCTC 
ATCAGATATT GGATTATGTA GAATCTGGGG ACTGGTTAGA TGATCCAGAA TTTAATATTT 
CTTATAGTCT TAAAGAAAAA GAGATCAAAC AGGAAGGTAG ACTCTTTGCA AAAATGACAT 
ACAAAATGAG AGCTACACAA GTTTTATCAG AGACACTACT TGCAAATAAC ATAGGAAAAT 
TCTTTCAAGA AAATGGGATG GTGAAGGGAG AGATTGAATT ACTTAAGAGA TTAACAACCA 
TATCAATATC AGGAGTTCCA CGGTATAATG AAGTGTACAA TAATTCTAAA AGCCATACAG 
ATGACCTTAA AACCTACAAT AAAATAAGTA ATCTTAATTT GTCTTCTAAT CAGAAATCAA 
AGAAATTTGA ATTCAAGTCA ACGGATATCT ACAATGATGG ATACGAGACT GTGAGCTGTT 
TCCTAACAAC AGATCTCAAA AAATACTGTC TTAATTGGAG ATATGAATCA ACAGCTCTAT 
TTGGAGAAAC TTGCAACCAA ATATTTGGAT TAAATAAATT GTTTAATTGG TTACACCCTC 
GTCTTGAAGG AAGTACAATC TATGTAGGTG ATCCTTACTG TCCTCCATCA GATAAAGAAC 
ATATATCATT AGAGGATCAC CCTGATTCTG GTTTTTACGT TCATAACCCA AGAGGGGGTA 
TAGAAGGATT TTGTCAAAAA TTATGGACAC TCATATCTAT AAGTGCAATA CATCTAGCAG 
CTGTTAGAAT AGGCGTGAGG GTGACTGCAA TGGTTCAAGG AGACAATCAA GCTATAGCTG 
TAACCACAAG AGTACCCAAC AATTATGACT ACAGAGTTAA GAAGGAGATA GTTTATAAAG 
ATGTAGTGAG ATTTTTTGAT TCATTAAGAG AAGTGATGGA TGATCTAGGT CATGAACTTA 
AATTAAATGA AACGATTATA AGTAGCAAGA TGTTCATATA TAGCAAAAGA ATCTATTATG 
ATGGGAGAAT TCTTCCTCAA GCTCTAAAAG CATTATCTAG ATGTGTCTTC TGGTCAGAGA 
CAGTAATAGA CGAAACAAGA TCAGCATCTT CAAATTTGGC AACATCATTT GCAAAAGCAA 
TTGAGAATGG TTATTCACCT GTTCTAGGAT ATGCATGCTC AATTTTTAAG AACATTCAAC 
AACTATATAT TGCCCTTGGG ATGAATATCA ATCCAACTAT AACACAGAAT ATCAGAGATC 
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AGTATTTTAG GAATCCAAAT TGGATGCAAT ATGCCTCTTT AATACCTGCT AGTGTTGGGG 
GATTCAATCA CATGGCCATG TCAAGATGTT TTGTAAGGAA TATTGGTGAT CCATCAGTTG 
CCGCATTGGC TGATATTAAA AGATTTATTA AGGCGAATCT ATTAGACCGA AGTGTTCTTT 
ATAGGATTAT GAATCAAGAA CCAGGTGAGT CATCTTTTTT TGACTGGGCT TCAGATCCAT 
ATTCATGCAA TTTACCACAA TCTCAAAATA TAACCACCAT GATAAAAAAT ATAACAGCAA 
GGAATGTATT ACAAGATTCA CCAAATCCAT TATTATCTGG ATTATTCACA AATACAATGA 
TAGAAGAAGA TGAAGAATTA GCTGAGTTCC TGATGGACAG GAAGGTAATT CTCCCTAGAG 
TTGCACATGA TATTCTAGAT AATTCTCTCA CAGGAATTAG AAATG CCATA GCTGGAATGT 
TAGATACGAC AAAATCACTA ATTCGGGTTG GCATAAATAG AGGAGGACTG ACATATAGTT 
TGTTGAGGAA AATCAGTAAT TACGATCTAG TACAATATGA AACACTAAGT AGGACTTTGC 
GACTAATTGT AAGTGATAAA ATCAAGTATG AAGATATGTG TTCGGTAGAC CTTGCCATAG 
CATTGCGACA AAAGATGTGG ATTCATTTAT CAGGAGGAAG GATGATAAGT GGACTTGAAA 
CGCCTGACCC ATTAGAATTA CTATCTGGGG TAGTAATAAC AGGATCAGAA CATTGTAAAA 
TATGTTATTC TTCAGATGGC ACAAACCCAT ATACTTGGAT GTATTTACCC GGTAATATCA 
AAATAGGATC AGCAGAAACA GGTATATCGT CATTAAGAGT TCCTTATTTT GGATCAGTCA 
CTGATGAAAG ATCTGAAGCA CAATTAGGAT ATATCAAGAA TCTTAGTAAA CCTGCAAAAG 
CCGCAATAAG AATAGCAATG ATATATACAT GGGCATTTGG TAATGATGAG ATATCTTGGA 
TGGAAGCCTC ACAGATAGCA CAAACACGTG CAAATTTTAC ACTAGATAGT CTCAAAATTT 
TAACACCGGT AGCTACATCA ACAAATTTAT CACACAGATT TAAGGATACT GCAACTCAGA 
TGAAATTCTC CAGTACATCA TTGATCAGAG TCAGCAGATT TATAACAATG TCCAATGATA 
ACATGTCTAT CAAAGAAGCT AATGAAACCA AAGATACTAA TCTTATTTAT CAACAAATAA 
TGTTAACAGG ATTAAGTGTT TTCGAATATT TATTTAGATT AAAAGAAACC ACAGGACACA 
ACCCTATAGT TATGCATCTG CACATAGAAG ATGAGTGTTG TATTAAAGAA AGTTTTAATG 
ATGAACATAT TAATCCAGAG TCTACATTAG AATTAATTCG ATATCCTGAA AGTAATGAAT 
TTATTTATGA TAAAGACCCA CTCAAAGATG TGGACTTATC AAAACTTATG GTTATTAAAG 
ACCATTCTTA CACAATTGAT ATGAATTATT GGGATGATAC TGACATCATA CATGCAATTT 
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CAATATGTAC TGCAATTACA ATAGCAGATA CTATGTCACA ATTAGATCGA GATAATTTAA 
AAGAGATAAT AGTTATTGCA AATGATGATG ATATTAATAG CTTAATCACT GAATTTTTGA 
CTCTTGACAT ACTTGTATTT CTCAAGACAT TTGGTGGATT ATTAGTAAAT CAATTTGCAT 
ACACTCTTTA TAGTCTAAAA ATAGAAGGTA GGGATCTCAT TTGGGATTAT ATAATGAGAA 
CACTGAGAGA TACTTCCCAT TCAATATTAA AAGTATTATC TAATGCATTA TCTCATCCTA 
AAGTATTCAA GAGGTTCTGG GATTGTGGAG TTTTAAACCC TATTTATGGT CCTAATATTG 
CTAGTCAAGA CCAGATAAAA CTTGCCCTAT CTATATGTGA ATATTCACTA GATCTATTTA 
TGAGAGAATG GTTGAATGGT GTATCACTTG AAATATACAT TTGTGACAGC GATATGGAAG 
TTGCAAATGA TAGGAAACAA GCCTTTATTT CTAGACACCT TTCATTTGTT TGTTGTTTAG 
CAGAAATTGC ATCTTTCGGA CCTAACCTGT TAAACTTAAC ATACTTGGAG AGACTTGATC 
TATTGAAACA ATATCTTGAA TTAAATATTA AAGAAGACCC TACTCTTAAA TATGTACAAA 
TATCTGGATT ATTAATTAAA TCGTTCCCAT CAACTGTAAC ATACGTAAGA AAGACTGCAA 
TCAAATATCT AAGGATTCGC GGTATTAGTC CACCTGAGGT AATTGATGAT TGGGATCCGG 
TAGAAGATGA AAATATGCTG GATAACATTG TCAAAACTAT AAATGATAAC TGTAATAAAG 
ATAATAAAGG GAATAAAATT AACAATTTCT GGGGACTAGC ACTTAAGAAC TATCAAGTCC 
TTAAAATCAG ATCTATAACA AGTGATTCTG ATGATAATGA TAGACTAGAT GCTAATACAA 
GTGGTTTGAC ACTTCCTCAA GGAGGGAATT ATCTATCGCA TCAATTGAGA TTATTCGGAA 
TCAACAGCAC TAGTTGTCTG AAAGCTCTTG AGTTATCACA AATTTTAATG AAGGAAGTCA 
ATAAAGACAA GGACAGGCTC TTCCTGGGAG AAGGAGCAGG AGCTATGCTA GCATGTTATG 
ATGCCACATT AGGACCTGCA GTTAATTATT ATAATTCAGG TTTGAATATA ACAGATGTAA 
TTGGTCAACG AGAATTGAAA ATATTTCCTT CAGAGGTATC ATTAGTAGGT °AAAAAATTAG 
GAAATGTGAC ACAGATTCTT AACAGGGTAA AAGTACTGTT CAATGGGAAT CCTAATTCAA 
CATGGATAGG AAATATGGAA TGTGAGAGCT TAATATGGAG TGAATTAAAT GATAAGTCCA 
TTGGATTAGT ACATTGTGAT ATGGAAGGAG CTATCGGTAA ATCAGAAGAA ACTGTTCTAC 
ATGAACATTA TAGTGTTATA AGAATTACAT ACTTGATTGG GGATGATGAT GTTGTTTTAG 
TTTCCAAAAT TATACCTACA ATCACTCCGA ATTGGTCTAG AATACTTTAT CTATATAAAT 
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TATATTGGAA AGATGTAAGT ATAATATCAC TCAAAACTTC TAATCCTGCA TCAACAGAAT 
TATATCTAAT TTCGAAAGAT GCATATTGTA CTATAATGGA ACCTAGTGAA ATTGTTTTAT 
CAAAACTTAA AAGATTGTCA CTCTTGGAAG AAAATAATCT ATTAAAATGG ATCATTTTAT 
CAAAGAAGAG GAATAATGAA TGGTTACATC ATGAAATCAA AGAAGGAGAA AGAGATTATG 
GAATCATGAG ACCATATCAT ATGGCACTAC AAATCTTTGG ATTTCAAATC AATTTAAATC 
ATCTGGCGAA AGAATTTTTA TCAACCCCAG ATCTGACTAA TATCAACAAT ATAATCCAAA 
GTTTTCAGCG AACAATAAAG GATGTTTTAT TTGAATGGAT TAATATAACT CATGATGATA 
AGAGACATAA ATTAGGCGGA AGATATAACA TATTCCCACT GAAAAATAAG GGAAAGTTAA 
GACTGCTATC GAGAAGACTA GTATTAAGTT GGATTTCATT ATCATTATCG ACTCGATTAC 
TTACAGGTCG CTTTCCTGAT GAAAAATTTG AACATAGAGC ACAGACTGGA TATGTATCAT 
TAGCTGATAC TGATTTAGAA TCATTAAAGT TATTGTCGAA AAACATCATT AAGAATTACA 
GAGAGTGTAT AGGATCAATA TCATATTGGT TTCTAACCAA AGAAGTTAAA ATACTTATGA 
AATTGATTGG TGGTGCTAAA TTATTAGGAA TTCCCAGACA ATATAAAGAA CCCGAAGACC 
AGTTATTAGA AAACTACAAT CAACATGATG AATTTGATAT CGATTAAAAC ATAAATACAA 
TGAAGATATA TCCTAACCTT TATCTTTAAG CCTAGGAATA GACAAAAAGT AAGAAAAACA 
TGTAATATAT ATATACCAAA CAGAGTTCTT CTCTTGTTTG GT 
(2) INFORMATION FOR SEQ ID NO: 22: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2233 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:22: 

Met Asp Thr Glu Ser Asn Asn Gly Thr Val Ser Asp lie Leu Tyr P 
15 10 15 

Glu Cys His Leu Asn Ser Pro lie Val Lys Gly Lys He Ala Gin L 
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20 25 30 

His Thr He Met Ser Leu Pro Gin Pro Tyr Asp Met Asp Asp Asp Ser 
35 40 45 

He Leu Val He Thr Arg Gin Lys He Lys Leu Asn Lys Leu Asp Lys 
50 55 60 

Arg Gin Arg Ser He Arg Arg Leu Lys Leu He Leu Thr Glu Lys Val 
65 70 75 80 

Asn Asp Leu Gly Lys Tyr Thr Phe He Arg Tyr Pro Glu Met Ser Lys 
85 90 95 

Glu Met Phe Lys Leu Tyr He Pro Gly He Asn Ser Lys Val Thr Glu 
100 105 HO 

Leu Leu Leu Lys Ala Asp Arg Thr Tyr Ser Gin Met Thr Asp Gly Leu 
115 120 125 

Arg Asp Leu Trp He Asn Val Leu Ser Lys Leu Ala Ser Lys Asn Asp 
130 135 140 

Gly Ser Asn Tyr Asp Leu Asn Glu Glu He Asn Asn He Ser Lys Val 
145 150 155 160 

His Thr Thr Tyr Lys Ser Asp Lys Trp Tyr Asn Pro Phe Lys Thr Trp 
165 170 175 

Phe Thr He Lys Tyr Asp Met Arg Arg Leu Gin Lys Ala Arg Asn Glu 
180 185 190 

He Thr Phe Asn Val Gly Lys Asp Tyr Asn Leu Leu Glu Asp Gin Lys 
195 200 205 

Asn Phe Leu Leu He His Pro Glu Leu Val Leu He Leu Asp Lys Gin 
210 215 220 

Asn Tyr Asn Gly Tyr Leu He Thr Pro Glu Leu Val Leu Met Tyr Cys 
225 230 235 240 

Asp Val Val Glu Gly Arg Trp Asn He Ser Ala Cys Ala Lys Leu Asp 
245 250 255 

Pro Lys Leu Gin Ser Met Tyr Gin Lys Gly Asn Asn Leu Trp Glu Val 
260 265 270 

He Asp Lys Leu Phe Pro He Met Gly Glu Lys Thr Phe Asp Val He 
275 280 285 

Ser Leu Leu Glu Pro Leu Ala Leu Ser Leu He Gin Thr His Asp Pro 
290 295 300 
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Val Lys Gin Leu Arg Gly Ala Phe Leu Asn His Val Leu Ser Glu Met 
305 310 315 320 

Glu Leu He Phe Glu Ser Arg Glu Ser lie Lys Glu Phe Leu Ser Val 
325 330 335 

Asp Tyr lie Asp Lys lie Leu Asp lie Phe Asn Lys Ser Thr lie Asp 
340 345 350 

Glu He Ala Glu He Phe Ser Phe Phe Arg Thr Phe Gly His Pro Pro 
355 360 365 

Leu Glu Ala Ser Ho Ala Ala Glu Lys Val Arg Lys Tyr Met Tyr He 
370 375 380 

Gly Lys Gin Leu Lys Phe Asp Thr He Asn Lys Cys His Ala He Phe 
385 390 395 400 

Cys Thr He He He Asn Gly Tyr Arg Glu Arg His Gly Gly Gin Trp 
405 410 415 

Pro Pro Val Thr Leu Pro Asp His Ala His Glu Phe He He Asn Ala 
420 425 430 

Tyr Gly Ser Asn Ser Ala He Ser Tyr Glu Asn Ala Val Asp Tyr Tyr 
435 440 445 

Gin Ser Phe He Gly He Lys Phe Asn Lys Phe He Glu Pro Gin Leu 
450 455 460 

Asp Glu Asp Leu Thr He Tyr Met Lys Asp Lys Ala Leu Ser Pro Lys 
465 470 475 480 

Lys Ser Asn Trp Asp Thr Val Tyr Pro Ala Ser Asn Leu Leu Tyr Arg 
485 490 495 

Thr Asn Ala Ser Asn Glu Ser Arg Arg Leu Val Glu Val Phe He Ala 
500 505 510 

Asp Ser Lys Phe Asp Pro His Gin He Leu Asp Tyr Val Glu Ser Gly 
515 520 525 

Asp Trp Leu Asp Asp Pro Glu Phe Asn He Ser Tyr Ser Leu Lys Glu 
530 535 540 

Lys Glu He Lys Gin Glu Gly Arg Leu Phe Ala Lys Met Thr Tyr Lys 
545 550 555 560 

Met Arg Ala Thr Gin Val Leu Ser Glu Thr Leu Leu Ala Asn Asn He 
565 570 575 
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Gly Lys Phe Phe Gin Glu Asn Gly Met Val Lys Gly Glu lie Glu Leu 
580 585 590 

Leu Lys Arg Leu Thr Thr lie Ser He Ser Gly Val Pro Arg Tyr Asn 
595 600 605 

Glu Val Tyr Asn Asn Ser Lys Ser His Thr Asp Asp Leu Lys Thr Tyr 
610 615 620 

Asn Lys He Ser Asn Leu Asn Leu Ser Ser Asn Gin Lys Ser Lys Lys 
625 630 635 640 

Phe Glu Phe Lys Ser Thr Asp He Tyr Asn Asp Gly Tyr Glu Thr Val 
645 650 655 

Ser Cys Phe Leu Thr . Thr Asp Leu Lys Lys Tyr Cys Leu Asn Trp Arg 
660 665 670 

Tyr Glu Ser Thr Ala Leu Phe Gly Glu Thr Cys Asn Gin He Phe Gly 
675 680 685 

Leu Asn Lys Leu Phe Asn Trp Leu His Pro Arg Leu Glu Gly Ser Thr 
690 695 700 

He Tyr Val Gly Asp Pro Tyr Cys Pro Pro Ser Asp Lys Glu His He 
705 710 715 720 

Ser Leu Glu Asp His Pro Asp Ser Gly Phe Tyr Val His Asn Pro Arg 
725 730 735 

Gly Gly He Glu Gly Phe Cys Gin Lys Leu Trp Thr Leu He Ser He 
740 745 750 

Ser Ala He His Leu Ala Ala Val Arg He Gly Val Arg Val Thr Ala 
755 760 765 

Met Val Gin Gly Asp Asn Gin Ala He Ala Val Thr Thr Arg Val Pro 
770 775 780 

Asn Asn Tyr Asp Tyr Arg Val Lys Lys Glu He Val Tyr Lys Asp Val 
785 790 795 800 

Val Arg Phe Phe Asp Ser Leu Arg Glu Val Met Asp Asp Leu Gly His 
805 810 815 

Glu Leu Lys Leu Asn Glu Thr He He Ser Ser Lys Met Phe He Tyr 
820 825 830 

Ser Lys Arg He Tyr Tyr Asp Gly Arg He Leu Pro Gin Ala Leu Lys 
835 840 845 

Ala Leu Ser Arg Cys Val Phe Trp Ser Glu Thr Val He Asp Glu Thr 
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850 855 860 

Arg Ser Ala Ser Ser Asn Leu Ala Thr Ser Phe Ala Lys Ala He Glu 
865 870 875 880 

Asn Gly Tyr Ser Pro Val Leu Gly Tyr Ala Cys Ser He Phe Lys Asn 
885 890 895 

He Gin Gin Leu Tyr He Ala Leu Gly Met Asn He Asn Pro Thr He 
900 905 910 

Thr Gin Asn He Arg Asp Gin Tyr Phe Arg Asn Pro Asn Trp Met Gin 
915 920 925 

Tyr Ala Ser Leu He Pro Ala Ser Val Gly Gly Phe Asn His Met Ala 
930 935 940 

Met Ser Arg Cys Phe Val Arg Asn He Gly Asp Pro Ser Val Ala Ala 
945 950 955 960 

Leu Ala Asp He Lys Arg Phe He Lys Ala Asn Leu Leu Asp Arg Ser 
965 970 975 

Val Leu Tyr Arg He Met Asn Gin Glu Pro Gly Glu Ser Ser Phe Phe 
980 985 990 

Asp Trp Ala Ser Asp Pro Tyr Ser Cys Asn Leu Pro Gin Ser Gin Asn 
995 1000 1005 

He Thr Thr Met He Lye Asn He Thr Ala Arg Asn Val Leu Gin Asp 
1010 1015 1020 

Ser Pro Asn Pro Leu Leu Ser Gly Leu Phe Thr Asn Thr Met He Glu 
1025 1030 1035 1040 

Glu Asp Glu Glu Leu Ala Glu Phe Leu Met Asp Arg Lys Val He Leu 
1045 1050 1055 

Pro Arg Val Ala His Asp He Leu Asp Asn Ser Leu Thr Gly He Arg 
1060 1065 1070 

Asn Ala He Ala Gly Met Leu Asp Thr Thr Lys Ser Leu He Arg Val 
1075 1080 1085 

Gly He Asn Arg Gly Gly Leu Thr Tyr Ser Leu Leu Arg Lys He Ser 
1090 1095 1100 

Asn Tyr Asp Leu Val Gin Tyr Glu Thr Leu Ser Arg Thr Leu Arg Leu 
1105 1110 1115 1120 

He Val Ser Asp Lys He Lys Tyr Glu Asp Met Cys Ser Val Asp Leu 
1125 1130 1135 
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Ala He Ala Leu Arg Gin Lys Met Trp He His Leu Ser Gly Gly Arg 
1140 1145 1150 

Met He Ser Gly Leu Glu Thr Pro Asp Pro Leu Glu Leu Leu Ser Gly 
1155 1160 1165 



Val Val He Thr Gly Ser Glu His 
1170 1175 

Gly Thr Asn Pro Tyr Thr Trp Met 
1185 1190 

Gly Ser Ala Glu Thr Gly lie Ser 
1205 *- 

Ser Val Thr Asp Glu Arg Ser Glu 
1220 



Cys Lys He Cys Tyr Ser Ser Asp 
1180 

Tyr Leu Pro Gly Asn He Lys He 
1195 1200 

Ser Leu Arg Val Pro Tyr Phe Gly 
1210 1215 

Ala Gin Leu Gly Tyr He Lys Asn 
1225 1230 



Leu Ser Lys Pro Ala Lys Ala Ala He Arg He Ala Met He Tyr Thr 
1235 1240 1245 

Trp Ala Phe Gly Asn Asp Glu He Ser Trp Met Glu Ala Ser Gin He 
1250 1255 1260 

Ala Gin Thr Arg Ala ABn Phe Thr Leu Asp Ser Leu Lys He Leu Thr 
1265 1270 1275 1280 

Pro Val Ala Thr Ser Thr Asn Leu Ser His Arg Phe Lye Asp Thr Ala 
1285 1290 1295 

Thr Gin Met Lys Phe Ser Ser Thr Ser Leu He Arg Val Ser Arg Phe 
1300 1305 1310 

He Thr Met Ser Asn Asp Asn Met Ser He Lys Glu Ala Asn Glu Thr 
1315 1320 1325 

Lys Asp Thr Asn Leu He Tyr Gin Gin He Met Leu Thr Gly Leu Ser 
1330 1335 1340 

Val Phe Glu Tyr Leu Phe Arg Leu Lys Glu Thr Thr Gly His Asn Pro 
1345 1350 1355 1360 

He Val Met His Leu His He Glu Asp Glu Cys Cys He Lys Glu Ser 
1365 1370 1375 

Phe Asn Asp Glu His He Asn Pro Glu Ser Thr Leu Glu Leu He Arg 
1380 1385 1390 

Tyr Pro Glu Ser Asn Glu Phe He Tyr Asp Lys Asp Pro Leu Lys Asp 
1395 1400 1405 
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Val Asp Leu Ser Lys Leu Met Val lie Lys Asp His Ser Tyr Thr lie 
1410 1415 1420 

Asp Met Asn Tyr Trp Asp Asp Thr Asp He He His Ala He Ser He 
1425 1430 1435 1440 

Cys Thr Ala He Thr He Ala Asp Thr Met Ser Gin Leu Asp Arg Asp 
1445 1450 1455 

Asn Leu Lys Qlu He He Val He Ala Asn Asp Asp Asp He Asn Ser 
1460 1465 1470 

Leu He Thr Glu Phe Leu Thr Leu Asp He Leu Val Phe Leu Lys Thr 
1475 1480 1485 

Phe Gly Gly Leu Leu Val Asn Gin Phe Ala Tyr Thr Leu Tyr Ser Leu 
1490 1495 1500 

Lys He Glu Gly Arg Asp Leu He Trp Asp Tyr He Met Arg Thr Leu 
1505 1510 1515 1520 

Arg Asp Thr Ser His Ser He Leu Lys Val Leu Ser Asn Ala Leu Ser 
1525 1530 1535 

His Pro Lys Val Phe Lys Arg Phe Trp Asp Cys Gly Val Leu Asn Pro 
1540 1545 1550 

He Tyr Gly Pro Asn He Ala Ser Gin Asp Gin He Lys Leu Ala Leu 
1555 1560 1565 

Ser He Cys Glu Tyr Ser Leu Asp Leu Phe Met Arg Glu Trp Leu Asn 
1570 1575 1580 

Gly Val Ser Leu Glu He Tyr He Cys Asp Ser Asp Met Glu Val Ala 
1585 1590 1595 1600 

Asn Asp Arg Lys Gin Ala Phe He Ser Arg His Leu Ser Phe Val Cys 
1605 1610 1615 

Cys Leu Ala Glu He Ala Ser Phe Gly Pro Asn Leu Leu Asn Leu Thr 
1620 1625 1630 

Tyr Leu Glu Arg Leu Asp Leu Leu Lys Gin Tyr Leu Glu Leu Asn He 
1635 1640 1645 

Lys Glu Asp Pro Thr Leu Lys Tyr Val Gin He Ser Gly Leu Leu He 
1650 1655 1660 

Lys Ser Phe Pro Ser Thr Val Thr Tyr Val Arg Lys Thr Ala He Lys 
1665 1670 1675 1680 

Tyr Leu Arg He Arg Gly He Ser Pro Pro Glu Val He Asp Asp Trp 



SUBSTITUTE SHEET (RULE 26) 



WO 98/13501 PCT/US97/16718 



- 290 



1685 1690 1695 

Asp Pro Val Glu Asp Glu Asn Mot Leu Asp Asn lie Val Lys Thr lie 
1700 1705 1710 

Asn Asp Asn Cys Asn Lys Asp Asn Lys Gly Asn Lys lie Asn Asn Phe 
1715 1720 1725 

Trp Gly Leu Ala Leu Lys Asn Tyr Gin Val Leu Lys lie Arg Ser lie 
1730 1735 1740 

Thr Ser Asp Ser Asp Asp Asn Asp Arg Leu Asp Ala Asn Thr Ser Gly 
1745 1750 1755 1760 

Leu Thr Leu Pro Gin Gly Gly Asn Tyr Leu Ser His Gin Leu Arg Leu 
1765 1770 1775 

Phe Gly lie Asn Ser Thr Ser Cys Leu Lys Ala Leu Glu Leu Ser Gin 
1780 1785 1790 

lie Leu Met Lys Glu Val Ann Lys Asp Lys Asp Arg Leu Phe Leu Gly 
1795 1800 1805 

Glu Gly Ala Gly Ala Met Leu Ala Cys Tyr Asp Ala Thr Leu Gly Pro 
1810 1815 1820 

Ala Val Asn Tyr Tyr Asn Ser Gly Leu Asn lie Thr Asp Val lie Gly 
1825 1830 1835 1840 

Gin Arg Glu Leu Lys He Phe Pro Ser Glu Val Ser Leu Val Gly Lys 
1845 1850 1855 

Lys Leu Gly Asn Val Thr Gin He Leu Asn Arg Val Lys Val Leu Phe 
1860 1865 1870 

Asn Gly Asn Pro Asn Ser Thr Trp He Gly Asn Met Glu Cys Glu Ser 
1875 1880 1885 



Leu He Trp Ser Glu Leu Asn Asp 
1890 1895 

Asp Met Glu Gly Ala He Gly Lys 
1905 1910 

His Tyr Ser Val He Arg He Thr 
1925 

Val Leu Val Ser Lys He He Pro 
1940 



Lys Ser He Gly Leu Val His Cys 
1900 

Ser Glu Glu Thr Val Leu His Glu 
1915 1920 

Tyr Leu He Gly Asp Asp Asp Val 
1930 1935 

Thr He Thr Pro Asn Trp Ser Arg 
1945 1950 



He Leu Tyr Leu Tyr Lys Leu Tyr Trp Lys Asp Val Ser He He Ser 
1955 1960 1965 
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Leu Lys Thr Ser Asn Pro Ala Ser 
1970 1975 

Asp Ala Tyr Cys Thr lie Met Glu 
1985 1990 

Leu Lys Arg Leu Ser Leu Leu Glu 
2005 

lie Leu Ser Lys Lys Arg Asn Asn 
2020 

Glu Gly Glu Arg 
2035 

Gin lie Phe Gly 
2050 

Leu Ser Thr Pro 
2065 

Gin Arg Thr He 



Asp Asp Lys Arg 
2100 

Lys Asn Lys Gly 
2115 

Trp He Ser Leu 
2130 

Asp Glu Lys Phe Glu His Arg Ala 
2145 2150 

Asp Thr Asp Leu Glu Ser Leu Lys 
2165 



Thr Glu Leu Tyr Leu He Ser Lys 
1980 

Pro Ser Glu He Val Leu Ser Lys 
1995 2000 

Glu Asn Asn Leu Leu Lys Trp He 
2010 2015 

Glu Trp Leu His His Glu He Lys 
2025 2030 



Gin Thr Gly Tyr Val Ser Leu Ala 
2155 2160 

Leu Leu Ser Lye Asn He He Lys 
2170 2175 



Asp Tyr Gly lie Met Arg Pro Tyr His Met Ala Leu 
2040 2045 

Phe Gin He Asn Leu Asn His Leu Ala Lys Glu Phe 
2055 2060 

Asp Leu Thr Asn He Asn Asn He He Gin Ser Phe 
2070 2075 2080 

Lys Asp Val Leu Phe Glu Trp He Asn He Thr His 
2085 2090 2095 

His Lys Leu Gly Gly Arg Tyr Asn He Phe Pro Leu 
2105 2110 

Lys Leu Arg Leu Leu Ser Arg Arg Leu Val Leu Ser 
2120 2125 

Ser Leu Ser Thr Arg Leu Leu Thr Gly Arg Phe Pro 
2135 2140 



Asn Tyr Arg Glu Cys He Gly Ser He Ser Tyr Trp Phe Leu Thr Lys 
2180 2185 2190 

Glu Val Lys He Leu Met Lys Leu He Gly Gly Ala Lys Leu Leu Gly 
2195 2200 2205 

He Pro Arg Gin Tyr Lys Glu Pro Glu Asp Gin Leu Leu Glu Asn Tyr 
2210 2215 2220 

Asn Gin His Asp Glu Phe Asp He Asp 
2225 2230 
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(2) INFORMATION FOR SEQ ID NO: 23: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15218 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND ED NESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:23: 
ACGCGAAAAA ATGCGTACTA CAAACTTGCA CATTCGAAAA AAATGGGGCA AATAAGAACT 
TGATAAGTGC TATTTAAGTC TAACCTTTTC AATCAGAAAT GGGGTGCAAT TCACTGAGCA 
TGATAAAGGT TAGATTACAA AATTTATTTG ACAATGACGA AGTAGCATTG TTAAAAATAA 
CATGTTATAC TGATAAATTA ATTCTTCTGA CCAATGCATT AGCCAAAGCA GCAATACATA 
CAATTAAATT AAACGGCATA GTTTTTATAC ATGTTATAAC AAGCAGTGAA GTGTGCCCTG 
ATAACAATAT TGTAGTGAAA TCTAACTTTA CAACAATGCC AATACTACAA AATGGAGGAT 
ACATATGGGA ATTGATTGAG TTGACACACT GCTCTCAATT AAACGGTTTA ATGGATGATA 
ATTGTGAAAT CAAATTTTCT AAAAGACTAA GTGACTCAGT AATGACTAAT TATATGAATC 
AAATATCTGA CTTACTTGGG CTTGATCTCA ATTCATGAAT TATGTTTAGT CTAATTCAAT 
AGACATGTGT TTATTACCAT TTTAGTTAAT ATAAAAACTC ATCAAAGGGA AATGGGGCAA 
ATAAACTCAC CTAATCAATC AAACCATGAG CACTACAAAT GACAACACTA CTATGCAAAG 
ATTGATGATC ACAGACATGA GACCCCTGTC AATGGATTCA ATAATAACAT CTCTT AC CAA 
AGAAATCATC ACACACAAAT TCATATACTT GATAAACAAT GAATGTATTG TAAGAAAACT 
TGATGAAAGA CAAGCTACAT TTACATTCTT AGTCAATTAT GAGATGAAGC TACTGCACAA 
AGTAGGGAGT ACCAAATACA AAAAATACAC TGAATATAAT ACAAAATATG GCACTTTCCC 
CATGCCTATA TTTATCAATC ACGGCGGGTT TCTAGAATGT ATTGGCATTA AGCCTACAAA 
ACACACTCCT ATAATATACA AATATGACCT CAACCCGTGA ATTCCAACAA AAAAACCAAC 
CCAACCAAAC CAAACTATTC CTCAAACAAC AGTGCTCAAT AGTTAAGAAG GAGCTAATCC 
ATTTTAGTAA TTAAAAATAA AAGTAAAGCC AATAACATAA ATTGGGGCAA ATACAAAGAT 
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GGCTCTTAGC AAAGTCAAGT TGAATGATAC ATTAAATAAG GATCAGCTGC TGTCATCCAG 
CAAATACACT ATTCAACGTA GTACAGGAGA TAATATTGAC ACTCCCAATT ATGATGTGCA 
AAAACACCTA AACAAACTAT GTGGTATGCT ATTAATCACT GAAGATGCAA ATCATAAATT 
CACAGGATTA ATAGGTATGT TATATGCTAT GTCCAGGTTA GGAAGGGAAG ACACTATAAA 
GATACTTAAA GATGCTGGAT ATCATGTTAA AGCTAATGGA GTAGATATAA CAACATATCG 
TCAAGATATA AATGGAAAGG AAATGAAATT CGAAGTATTA ACATTATCAA GCTTGACATC 
AGAAATACAA GTCAATATTG AGATAGAATC TAGAAAGTCC TACAAAAAAA TGCTAAAAGA 
GATGGGAGAA GTGGCTCCAG AATATAGGCA TGATTCTCCA GACTGTGGGA TGATAATACT 
GTGTATAGCT GCACTTGTGA TAACCAAATT AGCAGCAGGA GACAGATCAG GTCTTACAGC 
AGTAATTAGG AGGGCAAACA ATGTCTTAAA AAACGAAATA AAACGATACA AGGGCCTCAT 
ACCAAAGGAT ATAGCTAACA GTTTTTATGA AGTGTTTGAA AAACACCCTC ATCTTATAGA 
TGTTTTCGTG CACTTTGGCA TTGCACAATC ATCCACAAGA GGGGGTAGTA GAGTTGAAGG 
AATCTTTGCA GGATTGTTTA TGAATGCCTA TGGTTCAGGG CAAGTAATGC TAAGATGGGG 
AGTTTTAGCC AAATCTGTAA AAAATATCAT GCTAGGACAT GCTAGTGTCC AGGCAGAAAT 
GGAGCAAGTT GTGGAAGTCT ATGAGTATGC ACAGAAGTTG GGAGGAGAAG CTGGATTCTA 
CCATATATTG AACAATCCAA AAGCATCATT GCTGTCATTA ACTCAATTTC CCAACTTCTC 
AAGTGTGGTC CTAGGCAATG CAGCAGGTCT AGGCATAATG GGAGAGTATA GAGGTACACC 
AAGAAACCAG GATCTTTATG ATGCAGCTAA AGCATATGCA GAGCAACTCA AAGAAAATGG 
AGTAATAAAC TACAGTGTAT TAGACTTAAC AGCAGAAGAA TTGGAAGCCA TAAAGCATCA 
ACTCAACCCC AAAGAAGATG ATGTAGAGCT TTAAGTTAAC AAAAAATACG GGGCAAATAA 
GTCAACATGG AGAAGTTTGC ACCTGAATTT CATGGAGAAG ATGCAAATAA CAAAGCTACC 
AAATTCCTAG AATCAATAAA GGGCAAGTTC GCATCATCCA AAGATCCTAA GAAGAAAGAT 
AGCATAATAT CTGTTAACTC AATAGATATA GAAGTAACTA AAGAGAGCCC GATAACATCT 
GGCACCAACA TCATCAATCC AACAAGTGAA GCCGACAGTA CCCCAGAAAC AAAAGCCAAC 
TACCCAAGAA AAC CCCTAGT AAGCTTCAAA GAAGATCTCA CCCCAAGTGA CAACCCTTTT 
TCTAAGTTGT ACAAGGAAAC AATAGAAACA TTTGATAACA ATGAAGAAGA ATCTAGCTAC 
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TCATATGAAG AGATAAATGA TCAAACAAAT GACAACATTA CAGCAAGACT AGATAGAATT 
GATGAAAAAT TAAGTGAAAT ATTAGGAATG CTCCATACAT TAGTAGTTGC AAGTGCAGGA 
CCCACTTCAG CTCGCGATGG AATAAGAGAT GCTATGGTTG GTCTAAGAGA AGAGATGATA 
GAAAAAATAA GAGCGGAAGC ATTAATGACC AATGATAGGT TAGAGGCTAT GGCAAGACTT 
AGGAATGAGG AAAGCGAAAA AATGGCAAAA GACACCTCAG ATGAAGTGTC TCTTAATCCA 
ACTTCCAAAA AATTGAGTGA CTTGTTGGAA GACAACGATA GTGACAATGA TCTATCACTT 
GATGATTTTT GATCAGCGAT CAACTCACTC AGCAATCAAC AACATCAATA AAACAGACAT 
CAATCCATTG AATCAACTGC CAGACCGAAC AAACAAACGT CCATCAGTAG * AACCACCAAC 
CAATCAATCA ACCAATTGAT CAATCAGCAA CCCGACAAAA TTAACAATAT AGTAACAAAA 
AAAGAACAAG ATGGGGCAAA TATGGAAACA TACGTGAACA AGCTTCACGA AGGCTCCACA 
TACACAGCAG CTGTTCAGTA CAATGTTCTA GAAAAAGATG ATGATCCTGC ATCACTAACA 
ATATGGGTGC CTATGTTCCA GTCATCTGTG CCAGCAGACT TGCTCATAAA AGAACTTGCA 
AGCATCAATA TACTAGTGAA GCAGATCTCT ACGCCCAAAG GACCTTCACT ACGAGTCACG 
ATTAACTCAA GAAGTGCTGT GCTGGCTCAA ATGCCTAGTA ATTTCATCAT AAGCGCAAAT 
GTATCATTAG ATGAAAGAAG CAAATTAGCA TATGATGTAA CTACACCTTG TGAAATCAAA 
GCATGCAGTC TAACATGCTT AAAAGTAAAA AGTATGTTAA CTACAGTCAA AGATCTTACC 
ATGAAGACAT TCAACCCCAC TCATGAGATC ATTGCTCTAT GTGAATTTGA AAATATTATG 
ACATCAAAAA GAGTAATAAT ACCAACCTAT CTAAGATCAA TTAGTGTCAA GAACAAGGAT 
CTGAACTCAC TAGAAAATAT AGCAACCACC GAATTCAAAA ATGCTATCAC CAATGCAAAA 
ATTATTCCTT ATGCAGGATT AGTGTTAGTT ATCACAGTTA CTGACAATAA AGGAGCATTC 
AAATATATCA AACCACAGAG TCAATTTATA GTAGATCTTG GTGCCTACCT AGAAAAAGAG 
AGCATATATT ATGTGACTAC TAATTGGAAG CATACAGCTA CACGTTTTTC AATCAAACCA 
CTAGAGGATT AAACTTAATT ATCAACACTG AATGACAGGT CCACATATAT CCTCAAACTA 
CACACTATAT CCAAACATCA TAAACATCTA CACTACACAC TTCATCACAC AAACCAATCC 
CACTCAAAAT CCAAAATCAC TACCAGCCAC TATCTGCTAG ACCTAGAGTG CGAATAGGTA 
AATAAAACCA AAATATGGGG TAAATAGACA TTAGTTAGAG TTCAATCAAT CTTAACAACC 
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ATTTATACCG CCAATTCAAC ACATATACTA TAAATCTTAA AATGGGAAAT ACATCCATCA 
CAATAGAATT CACAAGCAAA TTTTGGCCCT ATTTTACACT AATACATATG ATCTTAACTC 
TAATCTTTTT ACTAATTATA ATCACTATTA TGATTGCAAT ACTAAATAAG CTAAGTGAAC 
ATAAAGCATT CTGTAACAAA ACTCTTGAAC TAGGACAGAT GTATCAAATC AACACATAGA 
GTTCTACCAT TATGCTGTGT CAAATTATAA TCCTGTATAT ATAAACAAAC AAATCCAATC 
TTCTCACAGA GTC*ATGGTGT CGCAAAACCA CGCTAACTAT CATGGTAGCA TAGAGTAGTT 
ATTTAAAAAT TAACATAATG ATGAATTGTT AGTATGAGAT CAAAAACAAC ATTGGGGCAA 
ATGCAACCAT GTCCAAACAC AAGAATCAAC GCACTGCCAG GACTCTAGAA AAGACCTGGG 
ATACTCTTAA TCATCTAATT GTAATATCCT CTTGTTTATA CAGATTAAAT TTAAAATCTA 
TAGCACAAAT AGCACTATCA GTTTTGGCAA TGATAATCTC AACCTCTCTC ATAATTGCAG 
CCATAATATT CATCATCTCT GCCAATCACA AAGTTACACT AACAACGGTC ACAGTTCAAA 
CAATAAAAAA CCACACTGAA AAAAACATCA CCACCTACCC TACTCAAGTC TCACCAGAAA 
GGGTTAGTTC ATCCAAGCAA CCCACAACCA CATCACCAAT CCACACAAGT TCAGCTACAA 
CATCACCCAA TACAAAATCA GAAACACACC ATACAACAGC ACAAACCAAA GGCAGAACCA 
CCACTTCAAC ACAGACCAAC AAGCCAAGCA CAAAACCACG TCCAAAAAAT CCACCAAAAA 
AAGATGATTA CCATTTTGAA GTGTTCAACT TCGTTCCCTG CAGTATATGT GGCAACAATC 
AACTTTGCAA ATCCATCTGC AAAACAATAC CAAGCAACAA ACCAAAGAAG AAACCAACCA 
TCAAACCCAC AAACAAACCA ACCACCAAAA CCACAAACAA AAGAGACCCA AAAACACCAG 
CCAAAACGAC GAAAAAAGAA ACTACCACCA ACCCAACAAA AAAACTAACC CTCAAGACCA 
CAGAAAGAGA CACCAGCACC TCACAATCCA CTGCACTCGA CACAACCACA TTAAAACACA 
CAGTCCAACA GCAATCCCTC CTCTCAACCA CCCCCGAAAA CACACCCAAC TCCACACAAA 
CACCCACAGC ATCCGAGCCC TCCACACCAA ACTCCACCCA AAAAACCCAG CCACATGCTT 
AGTTATTCAA AAACTACATC TTAGCAGAGA ACCGTGATCT ATCAAGCAAG AACGAAATTA 
AACCTGGGGC AAATAACCAT GGAGTTGATG ATCCACAAGT CAAGTGCAAT CTTCCTAACT 
CTTGCTATTA ATGCATTGTA CCTCACCTCA AGTCAGAACA TAACTGAGGA GTTTTACCAA 
TCGACATGTA GTGCAGTTAG CAGAGGTTAT TTTAGTGCTT TAAGAACAGG TTGGTATACT 
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AGTGTCATAA CAATAGAATT AAGTAATATA AAAGAAACCA AATGCAATGG AACTGACACT 
AAAGTAAAAC TTATGAAACA AGAATTAGAT AAGTATAAGA ATGCAGTAAC AGAATTACAG 
CTACTTATGC AAAACACACC AGCTGTCAAC AACCGGGCCA GAAGAGAAGC ACCACAGTAT 
ATGAACTACA CAATCAATAC CACTAAAAAC CTAAATGTAT CAATAAGCAA GAAGAGGAAA 
CGAAGATTTC TAGGCTTCTT GTTAGGTGTG GGATCTGCAA TAGCAAGTGG TATAGCTGTA 
TCAAAAGTTC TACACCTTGA AGGAGAAGTG AACAAGATCA AAAATGCTTT GTTGTCTACA 
AACAAAGCTG TAGTCAGTTT ATCAAATGGG GTCAGTGTTT TAACCAGCAA AGTGTTAGAT 
CTCAAGAATT ACATAAATAA CCAATTATTA CCCATAGTAA ATCAACAGAG CTGTCGCATC 
TCCAACATTG AAACAGTTAT AGAATTCCAG CAGAAGAACA GCAGATTGTT GGAAATCACC 
AGAGAATTTA GTGTCAATGC AGGTGTAACA ACACCTTTAA GCACTTACAT GTTGACAAAC 
AGTGAGTTAC TATCATTAAT CAATGATATG CCTATAACAA ATGATCAGAA AAAATTAATG 
TCAAGCAATG TTCAGATAGT AAGGCAACAA AGTTATTCCA TCATGTCTAT AATAAAGGAA 
GAAGTCCTTG CATATGTTGT ACAGCTGCCT ATCTATGGTG TAATAGATAC ACCTTGCTGG 
AAATTGCACA CATCGCCTCT ATGCACTACC AACATCAAAG AAGGATCAAA TATTTGTTTA 
ACAAGGACTG ATAGAGGATG GTATTGTGAT AATGCAGGAT CAGTATCCTT CTTTCCACAG 
GCTGACACTT GTAAAGTACA GTCCAATCGA GTATTTTGTG ACACTATGAA CAGTTTGACA 
TTACCAAGTG AAGTCAGCCT TTGTAACACT GACATATTCA ATTCCAAGTA TGACTGCAAA 
ATTATGACAT CAAAAACAGA CATAAGCAGC TCAGTAATTA CTTCTCTTGG AGCTATAGTG 
TCATGCTATG GTAAAACTAA ATGCACTGCA TCCAACAAAA ATCGTGGGAT TATAAAGACA 
TTTTCTAATG GTTGTGACTA TGTGTCAAAC AAAGGAGTAG ATACTGTGTC AGTGGGCAAC 
ACTTTATACT ATGTAAACAA GCTGGAAGGC AAGAACCTTT ATGTAAAAGG GGAACCTATA 
ATAAATTACT ATGACCCTCT AGTGTTTCCT TCTGATGAGT TTGATGCATC AATATCTCAA 
GTCAATGAAA AAATCAATCA AAGTTTAGCT TTTATTCGTA GATCTGATGA ATTACTACAT 
AATGTAAATA CTGGCAAATC TACTACAAAT ATTATGATAA CTACAATTAT TATAGTAATC 
ATTGTAGTAT TGTTATCATT AATAGCtATT GGTTTACTGT TGTATTGTAA AGCCAAAAAC 
ACACCAGTTA CACTAAGCAA AGACCAACTA AGTGGAATCA ATAATATTGC ATTCAGCAAA 
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TAGACAAAAA ACCACCTGAT CATGTTTCAA CAACAATCTG CTGACCACCA ATCCCAAATC 
AACTTACAAC AAATATTTCA ACATCACAGT ACAGGCTGAA TCATTTCCTC ACATCATGCT 
AC CCACAT AA CTAAGCTAGA TCCTTAACTT ATAGTTACAT AAAAACCTCA AGTATCACAA 
TCAACCACTA AATCAACACA TCATTCACAA AATTAACAGC TGGGG CAAAT ATGTCGCGAA 
GAAATCCTTG TAAATTTGAG ATTAGAGGTC ATTGCTTGAA TGGTAGAAGA TGTCACTACA 
GTCATAATTA CTTTGAATGG CCTCCTCATG CATTACTAGT GAGGCAAAAC TTCATGTTAA 
ACAAGATACT CAAGTCAATG GACAAAAGCA TAGACACTTT GTCTGAAATA AGTGGAGCTG 
CTGAACTGGA TAGAACAGAA GAATATGCTC TTGGTATAGT TGGAGTGCTA GAGAGTTACA 
TAGGATCTAT AAACAACATA ACAAAACAAT CAGCATGTGT TGCTATGAGT AAACTTCTTA 
TTGAGATCAA TAGTGATGAC ATTAAAAAGC TTAGAGATAA TGAAGAACCC AATTCACCTA 
AGATAAGAGT GTACAATACT GTTATATCAT ACATTGAGAG CAATAGAAAA AACAACAAGC 
AAACCATCCA TCTGCTCAAG AGACTACCAG CAGACGTGCT GAAGAAGACA ATAAAGAACA 
CATTAGATAT CCACAAAAGC ATAACCATAA GCAATCCAAA AGAGTCAACT GTGAATGATC 
AAAATGACCA AACCAAAAAT AATGATATTA CCGGATAAAT ATCCTTGTAG TATATCATCC 
ATATTGATCT CAAGTGAAAG CATGGTTGCT ACATTCAATC ATAAAAACAT ATTACAATTT 
AACCATAACT ATTTGGATAA CCACCAGCGT TTATTAAATC ATATATTTGA TGAAATTCAT 
TGGACACCTA AAAACTTATT AGATGCCACT CAACAATTTC TCCAACATCT TAACATCCCT 
GAAGATATAT ATACAGTATA TATATTAGTG TCATAATGCT TGACCATAAC GACTCTATGT 
CATCCAACCA TAAAACTATT TTGATAAGGT TATGGGACAA AATGGATCCC ATTATTAATG 
GAAACTCTGC TAATGTGTAT CTAACTGATA GTTATTTAAA AGGTGTTATC TCTTTTTCAG 
AGTGTAATGC TTTAGGGAGT TATCTTTTTA ACGGCCCTTA TCTTAAAAAT GATTACACCA 
ACTTAATTAG TAGACAAAGC CCACTACTAG AGCATATGAA TCTTAAAAAA CTAACTATAA 
CACAGTCATT AATATCTAGA TATCATAAAG GTGAACTGAA ATTAGAAGAA CCAACTTATT 
TCCAGTCATT ACTTATGACA TATAAAAGTA TGTCCTCGTC TGAACAAATT GCTACAACTA 
ACTTACTTAA AAAAATAATA CGAAGAGCCA TAGAAATAAG TGATGTAAAG GTGTACGCCA 
TCTTGAATAA ACTAGGATTA AAGGAAAAGG ACAGAGTTAA GCCCAACAAT AATTCAGGTG 
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ATGAAAACTC AGTACTTACA ACCATAATTA AAGAT6ATAT ACTTTCGGCT GTGGAAAACA 
ATCAATCATA TACAAATTCA GACAAAAGTC ACTCAGTAAA TCAAAATATC ACTATCAAAA 
CAACACTCTT GAAAAAATTG ATGTGTTCAA TGCAACATCC TCCATCATGG TTAATACACT 
GGTTCAATTT ATATACAAAA TTAAATAACA TATTAACACA ATATCGATCA AATGAGGTAA 
AAAGTCATGG GTTTATATTA ATAGATAATC AAACTTTAAG TGGTTTTCAG TTTATTTTAA 
ATCAATATGG TTGTATCGTT TATCATAAAG GACTCAAAAA AATCACAACT ACTACTTACA 
ATCAATTTTT GACATGGAAA GACATCAGCC TTAGCAGATT AAATGTTTGC TTAATTACTT 
GGATAAGTAA TTGTTTAAAT ACATTAAACA AAAGCTTAGG GCTGAGATGT GGATTCAATA 
ATGTTGTGTT ATCACAATTA TTTCTTTATG GAGATTGTAT ACTGAAATTA TTTCATAATG 
AAGGCTTCTA CATAATAAAA GAAGTAGAGG GATTTATTAT GTCTTTAATT CTAAACATAA 
CAGAAGAAGA TCAATTTAGG AAACGATTTT ATAATAGCAT GCTAAATAAC ATCACAGATG 
CAGCTATTAA GGCTCAAAAG GACCTACTAT CAAGAGTATG TCACACTTTA TTAGACAAGA 
CAGTGTCTGA TAATATCATA AATGGTAAAT GGATAATCCT ATTAAGTAAA TTTCTTAAAT 
TGATTAAGCT TGCAGGTGAT AATAATCTCA ATAACTTGAG TGAGCTATAT TTTCTCTTCA 
GAATCTTTGG ACATCCAATG GTCGATGAAA GACAAGCAAT GGATTCTGTA AGAATTAACT 
GTAATGAAAC TAAGTTCTAC TTATTAAGTA GTCTAAGTAC ATTAAGAGGT GCTTTCATTT 
ATAGAATCAT AAAAGGGTTT GTAAATACCT ACAACAGATG GCCCACCTTA AGGAATGCTA 
TTGTCCTACC TCTAAGATGG TTAAACTACT ATAAACTTAA TACTTATCCA TCTCTACTTG 
AAATCACAGA AAATGATTTG ATTATTTTAT CAGGATTGCG GTTCTATCGT GAGTTTCATC 
TGCCTAAAAA AGTGGATCTT GAAATGATAA TAAATGACAA AGCCATTTCA CCTCCAAAAG 
ATCTAATATG GACTAGTTTT CCTAGAAATT ACATGCCATC ACATATACAA AATTATATAG 
AACATGAAAA GTTGAAGTTC TCTGAAAGCG ACAGATCGAG AAGAGTACTA GAGTATTACT 
TGAGAGATAA TAAATTCAAT GAATGCGATC TATACAATTG TGTAGTCAAT CAAAGCTATC 
TCAACAACTC TAATCACGTG GTATCACTAA CTGGTAAAGA AAGAGAGCTC AGTGTAGGTA 
GAATGTTTGC TATGCAACCA GGTATGTTTA GGCAAATCCA AATCTTAGCA GAGAAAATGA 
TAGCTGAAAA TATTTTACAA TTCTTCCCTG AGAGTTTGAC AAGATATGGT GATCTAGAGC 
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TTCAAAAGAT ATTAGAATTA AAAGCAGGAA TAAGCAACAA GTCAAATCGT TATAATGATA 
ACTACAACAA TTATATCAGT AAATGTTCTA TCATTACAGA TCTTAGCAAA TTCAATCAGG 
CATTTAGATA TGAAACATCA TGTATCTGCA GTGATGTATT AGATGAACTG CATGGAGTAC 
AATCTCTGTT CTCTTGGTTG CATTTAACAA TACCTCTTGT CACAATAATA TGTACATATA 
GACATGCACC TCCTTTCATA AAGGATCATG TTGTTAATCT TAATGAGGTT GATGAACAAA 
GTGGATTATA CAGATATCAT ATGGGTGGTA TTGAGGGCTG GTGTCAAAAA CTGTGGACCA 
TTGAAGCTAT ATCATTATTA GATCTAATAT CTCTCAAAGG GAAATTCTCT ATCACAGCTC 
TGATAAATGG TGATAATCAG TCAATTGATA TAAGCAAACC AGTTAGACTT ATAGAGGGTC 
AGACCCATGC ACAAGCAGAT TATTTGTTAG CATTAAATAG CCTTAAATTG TTATATAAAG 
AGTATGCAGG TATAGGCCAT AAGCTTAAGG GAACAGAGAC CTATATATCC CGAGATATGC 
AGTTCATGAG CAAAACAATC CAGCACAATG GAGTGTACTA TCCAGCCAGT ATCAAAAAAG 
TCCTGAGAGT AGGTCCATGG ATAAACACGA TACTTGATGA TTTTAAAGTT AGTTTAGAAT 
CTATAGGCAG CTTAACACAG GAGTTAGAAT ACAGAGGAGA AAGCTTATTA TGCAGTTTAA 
TATTTAGGAA CATTTGGTTA TACAATCAAA TTGCTTTGCA ACTCCGAAAT CATGCATTAT 
GTAACAATAA GCTATATTTA GATATATTGA AAGTATTAAA ACACTTAAAA ACTTTTTTTA 
ATCTTGATAG CATTGATATG GCTTTATCAT TGTATATGAA TTTGCCTATG CTGTTTGGTG 
GTGGTGATCC TAATTTGTTA TATCGAAGCT TTTATAGGAG AACTCCAGAC TTCCTTACAG 
AAGCTATAGT ACATTCAGTG TTTGTGTTGA GCTATTATAC TGGTCACGAT TTACAAGATA 
AGCTCCAGGA TCTTCCAGAT GATAGACTGA ACAAATTCTT GACATGTGTC ATCACATTTG 
ATAAAAATCC CAATGCCGAG TTTGTAACAT TGATGAGGGA TCCACAGGCT TTAGGGTCTG 
AAAGGCAAGC TAAAATTACT AGTGAGATTA ATAGATTAGC AGTAACAGAA GTCTTAAGTA 
TAGCCCCAAA CAAAATATTT TCTAAAAGTG CACAACATTA TACTACCACT GAGATTGATC 
TAAATGACAT TATGCAAAAT ATAGAACCAA CTTACCCTCA TGGATTAAGA GTTGTTTATG 
AAAGTTTACC TTTTTATAAA GCAGAAAAAA TAGTTAATCT TATATCAGGA ACAAAATCCA 
TAACTAATAT ACTTGAAAAA ACATCAGCAA TAGATACAAC TGATATTAAT AGGGCTACTG 
ATATGATGAG GAAAAATATA ACTTTACTTA TAAGGATACT TCCACTAGAT TGTAACAAAG 
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ACAAAAGAGA GTTATTAAGT TTAGAAAATC TTAGTATAAC TGAATTAAGC AAGTATGTAA 
GAGAAAGATC TTGGTCATTA TCCAATATAG TAGGAGTAAC ATCGCCAAGT ATTATGTTCA 
CAATGGACAT TAAATATACA ACTAGCACTA TAGCCAGTGG TATAATAATA GAAAAATATA 
ATGTTAATAG TTTAACTCGT GGTGAAAGAG GACCCACCAA GCCATGGGTA GGCTCATCCA 
CGCAGGAGAA AAAAACAATG CCAGTGTACA ACAGACAAGT TTTAACCAAA AAGCAAAGAG 
ACCAAATAGA TTTATTAGCA AAATTAGACT GGGTATATGC ATCCATAGAC AACAAAGATG 
AATTCATGGA AGAACTGAGT ACTGGAACAC TTGGACTGTC ATATGAAAAA GCCAAAAAGT 
TGTTTCCACA ATATCTAAGT GTCAATTATT TACACCGTTT AACAGTCAGT AGTAGACCAT 
GTGAATTCCC TGCATCAATA CCAGCTTATA GAACAACAAA TTATCATTTT GATACTAGTC 
CTATCAATCA TGTATTAACA GAAAAGTATG GAGATGAAGA TATCGACATT GTGTTTCAAA 
ATTGCATAAG TTTTGGTCTT AGCCTGATGT CGGTTGTGGA ACAATTCACA AACATATGTC 
CTAATAGAAT TATTCTCATA CCGAAGCTGA ATGAGATACA TTTGATGAAA CCTCCTATAT 
TTACAGGAGA TGTTGATATC ATCAAGTTGA AGCAAGTGAT ACAAAAGCAG CACATGTTCC 
TACCAGATAA AATAAGTTTA ACCCAATATG TAGAATTATT CTTAAGTAAC AAAGCACTTA 
AATCTGGATC TCACATCAAC TCTAATTTAA TATTAGTACA TAAAATGTCT GATTATTTTC 
ATAATGCTTA TATTTTAAGT ACTAATTTAG CTGGACATTG GATTCTGATT ATTCAACTTA 
TGAAAGATTC AAAAGGTATT TTTGAAAAAG ATTGGGGAGA GGGGTACATA ACTGATCATA 
TGTTCATTAA TTTGAATGTT TTCTTTAATG CTTATAAGAC TTATTTGCTA TGTTTTCATA 
AAGGTTATGG TAAAGCAAAA TTAGAATGTG ATATGAACAC TTCAGATCTT CTTTGTGTTT 
TGGAGTTAAT AGACAGTAGC TACTGGAAAT CTATGTCTAA AGTTTTCCTA GAACAAAAAG 
TCATAAAATA CATAGTCAAT CAAGACACAA GTTTGCGTAG AATAAAAGGC TGTCACAGTT 
TTAAGTTGTG GTTTTTAAAA CGCCTTAATA ATGCTAAATT TACCGTATGC CCTTGGGTTG 
TTAACATAGA TTATCACCCA ACACACATGA AAGCTATATT ATCTTACATA GATTTAGTTA 
GAATGGGGTT AATAAATGTA GATAAATTAA CCATTAAAAA TAAAAACAAA TTCAATGATG 
AATTTTACAC ATCAAATCTC TTTTACATTA GTTATAACTT TTCAGACAAC ACTCATTTGC 
TAACAAAACA AATAAGAATT GCTAATTCAG AATTAGAAGA TAATTATAAC AAACTATATC 
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ACCCAACCCC AGAAACTTTA 6AAAATATGT CATTAATTCC TGTTAAAAGT AATAATAGTA 
ACAAACCTAA ATTTTGTATA AGTGGAAATA CCGAATCTAT GATGATGTCA ACATTCTCTA 
GTAAAATGCA TATTAAATCT TCCACTGTTA CCACAAGATT CAATTATAGC AAACAAGACT 
TGTACAATTT ATTTCCAATT GTTGTGATAG ACAAGATTAT AGATCATTCA GGTAATACAG 
CAAAATCTAA CCAACTTTAC ACCACCACTT CACATCAGAC ATCTTTAGTA AGGAATAGTG 
CATCACTTTA TTGCATGCTT CCTTGGCATC ATGTCAATAG ATTTAACTTT GTATTTAGTT 
CCACAGGATG CAAGATCAGT ATAGAGTATA TTTTAAAAGA TCTTAAGATT AAGGACCCCA 
GTTGTATAGC ATTCATAGGT GAAGGAGCTG GTAACTTATT ATTACGTACG GTAGTAGAAC 
TTCATCCAGA CATAAGATAC ATTTACAGAA GTTTAAAAGA TTGCAATGAT CATAGTTTAC 
CTATTGAATT TCTAAGGTTA TACAACGGGC ATATAAACAT AGATTATGGT GAGAATTTAA 
CCATTCCTGC TACAGATGCA ACTAATAACA TTCATTGGTC TTATTTACAT ATAAAATTTG 
CAGAACCTAT TAGCATCTTT GTCTGCGATG CTGAATTACC TGTTACAGCC AATTGGAGTA 
AAATTATAAT TGAATGGAGT AAGCATGTAA GAAAGTGCAA GTACTGTTCT TCTGTAAATA 
GATGCATTTT AATTGCAAAA TATCATGCTC AAGATGACAT TGATTTCAAA TTAGATAACA 
TTACTATATT AAAAACTTAC GTGTGCCTAG GTAGCAAGTT AAAAGGATCT GAAGTTTACT 
TAATCCTTAC AATAGGCCCT GCAAATATAC TTCCTGTTTT TGATGTTGTA CAAAATGCTA 
AATTGACACT TTCAAGAACT AAAAATTTCA TTATGCCTAA AAAAACTGAC AAGGAATCTA 
TCGATGCAAA TATTAAAAGC TTAATACCTT TCCTTTGTTA CCCTATAACA AAAAAAGGAA 
TTAAGACTTC ATTGTCAAAA TTGAAGAGTG TAGTTAATGG AGATATATTA TCATATTCTA 
TAGCTGGACG TAATGAAGTA TTCAGCAACA AGCTTATAAA CCACAAGCAT ATGAATATCC 
TAAAATGGCT AGATCATGTT TTAAATTTTA GATCAGCTGA ACTTAATTAC AATCATTTAT 
ACATGATAGA GTCCACATAT CCTTACTTAA GTGAATTGTT AAATAGTTTA ACAACCAATG 
AGCTCAAGAA GCTGATTAAA ATAACAGGTA GTGTGCTATA CAACCTTCCC AACGAACAGT 
AGTTTAAAAT ATCATTAACA AGTTTGGTCA AATTTAGATG CTAACACATC ATTATATTAT 
AGTTATTAAA AAATATACAA ACTTTTCAAT AATTTAGCAT ATTGATTCCA AAATTATCAT 
TTTAGT CTTA AGGGGTTAAA TAAAAGTCTA AAACTAACAA TTATACATGT GCATTCACAA 
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CACAACGAGA CATTAGTTTT TGACACTTTT TTTCTCGT 15218 
(2) INFORMATION FOR SEQ ID NO: 24: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2166 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:24: 

Met Asp Pro lie lie Asn Gly Asn Ser Ala Asn Val Tyr Leu Thr Asp 
15 10 15 

Ser Tyr Leu Lys Gly Val He Ser Phe Ser Glu Cya Asn Ala Leu Gly 
20 25 30 

Ser Tyr Leu Phe Asn Gly Pro Tyr Leu Lys Asn Asp Tyr Thr Asn Leu 
35 40 45 

He Ser Arg Gin Ser Pro Leu Leu Glu His Met Asn Leu Lys Lys Leu 
50 55 60 

Thr He Thr Gin Ser Leu He Ser Arg Tyr His Lys Gly Glu Leu Lys 
65 70 75 80 

Leu Glu Glu Pro Thr Tyr Phe Gin Ser Leu Leu Met Thr Tyr Lys Ser 
85 90 95 

Met Ser Ser Ser Glu Gin He Ala Thr Thr Asn Leu Leu Lys Lys He 
100 105 110 

He Arg Arg Ala He Glu He Ser Asp Val Lys Val Tyr Ala He Leu 
115 120 125 

Asn Lys Leu Gly Leu Lys Glu Lys Asp Arg Val Lys Pro Asn Asn Asn 
130 135 140 

Ser Gly Asp Glu Asn Ser Val Leu Thr Thr He He Lys Asp Asp He 
145 150 155 160 

Leu Ser Ala Val Glu Asn Asn Gin Ser Tyr Thr Asn Ser Asp Lys Ser 
165 170 175 

His Ser Val Asn Gin Asn He Thr He Lys Thr Thr Leu Leu Lys Lys 
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180 185 190 

Leu Met Cys Ser Met Gin His Pro Pro Ser Trp Leu lie His Trp Phe 
195 200 205 

Asn Leu Tyr Thr Lys Leu Asn Asn lie Leu Thr Gin Tyr Arg Ser Asn 
210 215 220 

Glu Val Lys Ser His Gly Phe lie Leu lie Asp Asn Gin Thr Leu Ser 
225 230 235 240 

Gly Phe Gin Phe lie Leu Asn Gin Tyr Gly Cys lie Val Tyr His Lys 
245 250 255 

Gly Leu Lys Lys lie Thr Thr Thr Thr Tyr Asn Gin Phe Leu Thr Trp 
260 265 270 

Lys Asp lie Ser Leu Ser Arg Leu Asn Val Cys Leu lie Thr Trp lie 
275 280 285 

Ser Asn Cys Leu Asn Thr Leu Asn Lys Ser Leu Gly Leu Arg Cys Gly 
290 295 300 

Phe Asn Asn Val Val Leu Ser Gin Leu Phe Leu Tyr Gly Asp Cys He 
305 310 315 320 

Leu Lys Leu Phe His Asn Glu Gly Phe Tyr He He Lys Glu Val Glu 
325 330 335 

Gly Phe He Met Ser Leu He Leu Asn He Thr Glu Glu Asp Gin Phe 
340 345 350 

Arg Lys Arg Phe Tyr Asn Ser Met Leu Asn Asn He Thr Asp Ala Ala 
355 360 365 

He Lys Ala Gin Lys Asp Leu Leu Ser Arg Val Cys His Thr Leu Leu 
370 375 380 

Asp Lys Thr Val Ser Asp Asn He He Asn Gly Lys Trp He He Leu 
385 390 395 400 

Leu Ser Lys Phe Leu Lys Leu He Lys Leu Ala Gly Asp Asn Asn Leu 
405 410 415 

Asn Asn Leu Ser Glu Leu Tyr Phe Leu Phe Arg He Phe Gly His Pro 
420 425 430 

Met Val Asp Glu Arg Gin Ala Met Asp Ser Val Arg He Asn Cys Asn 
435 440 445 

Glu Thr Lys Phe Tyr Leu Leu Ser Ser Leu Ser Thr Leu Arg Gly Ala 
450 455 460 
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Phe He Tyr Arg He He Lys Gly Phe Val Asn Thr Tyr Asn Arg Trp 
465 470 475 480 

Pro Thr Leu Arg Asn Ala He Val Leu Pro Leu Arg Trp Leu Asn Tyr 
485 490 495 

Tyr Lys Leu Asn Thr Tyr Pro Ser Leu Leu Glu He Thr Glu Asn Asp 
500 505 510 

Leu He He Leu Ser Gly Leu Arg Phe Tyr Arg Glu Phe His Leu Pro 
515 520 525 

Lys Lys Val Asp Leu Glu Met He He Asn Asp Lys Ala He Ser Pro 
530 535 540 

Pro Lys Asp Leu He Trp Thr Ser Phe Pro Arg Asn Tyr Met Pro Ser 
545 550 555 560 " 

His He Gin Asn Tyr He Glu His Glu Lys Leu Lys Phe Ser Glu Ser 
565 570 575 

Asp Arg Ser Arg Arg Val Leu Glu Tyr Tyr Leu Arg Asp Asn Lys Phe 
580 585 590 

Asn Glu Cys Asp Leu Tyr Asn Cys Val Val Asn Gin Ser Tyr Leu Asn 
595 600 605 

Asn Ser Asn His Val Val Ser Leu Thr Gly Lys Glu Arg Glu Leu Ser 
610 615 620 

Val Gly Arg Met Phe Ala Met Gin Pro Gly Met Phe Arg Gin He Gin 
625 630 635 640 

He Leu Ala Glu Lys Met He Ala Glu Asn He Leu Gin Phe Phe Pro 
645 650 655 

Glu Ser Leu Thr Arg Tyr Gly Asp Leu Glu Leu Gin Lys He Leu Glu 
660 665 670 

Leu Lys Ala Gly He Ser Asn Lys Ser Asn Arg Tyr Asn Asp Asn Tyr 
675 680 685 

Asn Asn Tyr He Ser Lys Cys Ser He He Thr Asp Leu Ser Lys Phe 
690 695 700 

Asn Gin Ala Phe Arg Tyr Glu Thr Ser Cys He Cys Ser Asp Val Leu 
705 710 715 720 

Asp Glu Leu His Gly Val Gin Ser Leu Phe Ser Trp Leu His Leu Thr 
725 730 735 
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lie Pro Leu Val Thr lie lie Cys Thr Tyr Arg His Ala Pro Pro Phe 
740 745 750 

lie Lys Asp His Val Val Asn Leu Asn Glu Val Asp Glu Gin Ser Gly 
755 760 765 

Leu Tyr Arg Tyr His Met Gly Gly lie Glu Gly Trp Cys Gin Lys Leu 
770 775 780 

Trp Thr lie Glu Ala lie Ser Leu Leu Asp Leu lie Ser Leu Lys Gly 
785 790 795 800 

Lys Phe Ser lie Thr Ala Leu lie Asn Gly Asp Asn Gin Ser lie Asp 
805 810 815 

He Ser Lys Pro Val Arg Leu He Glu Gly Gin Thr His Ala Gin Ala 
820 825 830 

Asp Tyr Leu Leu Ala Leu Asn Ser Leu Lys Leu Leu Tyr Lys Glu Tyr 
835 840 845 

Ala Gly He Gly His Lys Leu Lys Gly Thr Glu Thr Tyr He Ser Arg 
850 855 860 

Asp Met Gin Phe Met Ser Lys Thr He Gin His Asn Gly Val Tyr Tyr 
865 870 875 880 

Pro Ala Ser He Lys Lys Val Leu Arg Val Gly Pro Trp He Asn Thr 
885 890 895 

He Leu Asp Asp Phe Lys Val Ser Leu Glu Ser He Gly Ser Leu Thr 
900 905 910 

Gin Glu Leu Glu Tyr Arg Gly Glu Ser Leu Leu Cys Ser Leu He Phe 
915 920 925 

Arg Asn He Trp Leu Tyr Asn Gin He Ala Leu Gin Leu Arg Asn His 
930 935 940 

Ala Leu Cys Asn Asn Lys Leu Tyr Leu Asp He Leu Lys Val Leu Lys 
945 950 955 960 

His Leu Lys Thr Phe Phe Asn Leu Asp Ser He Asp Met Ala Leu Ser 
965 970 975 

Leu Tyr Met Asn Leu Pro Met Leu Phe Gly Gly Gly Asp Pro Asn Leu 
980 985 990 

Leu Tyr Arg Ser Phe Tyr Arg Arg Thr Pro Asp Phe Leu Thr Glu Ala 
995 1000 1005 

He Val His Ser Val Phe Val Leu Ser Tyr Tyr Thr Gly His Asp Leu 
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1010 1015 1020 

Gin Asp Lys Leu Gin Asp Leu Pro Asp Asp Arg Leu Asn Lys Phe Leu 
1025 1030 1035 1040 

Thr Cys Val lie Thr Phe Asp Lys Asn Pro Asn Ala Glu Phe Val Thr 
1045 1050 1055 

Leu Met Arg Asp Pro Gin Ala Leu Gly Ser Glu Arg Gin Ala Lys He 
1060 1065 1070 

Thr Ser Glu He Asn Arg Leu Ala Val Thr Glu Val Leu Ser He Ala 
1075 1080 1085 

Pro Asn Lys He Phe Ser Lys Ser Ala Gin His Tyr Thr Thr Thr Glu 
1090 1095 1100 

He Asp Leu Asn Asp He Met Gin Asn He Glu Pro Thr Tyr Pro His 
1105 1110 1115 1120 

Gly Leu Arg Val Val Tyr Glu Ser Leu Pro Phe Tyr Lys Ala Glu Lys 
1125 1130 1135 

He Val Asn Leu He Ser Gly Thr Lys Ser He Thr Asn He Leu Glu 
1140 1145 1150 

Lys Thr Ser Ala He Asp Thr Thr Asp He Asn Arg Ala Thr Asp Met 
1155 1160 1165 



Met Arg Lys Asn 
1170 

Asn Lys Asp Lys 
1185 

Glu Leu Ser Lys 



Val Gly Val Thr 
1220 

Thr Thr Ser Thr 
1235 

Asn Ser Leu Thr 
1250 



Leu Leu He Arg He 
1175 



Ser Gly He He He 
1240 

Glu Arg Gly Pro Thr 
1255 



Leu Pro Leu Asp Cys 
1180 



Glu Lys Tyr Asn Val 
1245 

Lys Pro Trp Val Gly 
1260 



He Thr 



Arg Glu Leu Leu Ser Leu Glu Asn Leu Ser He Thr 
1190 1195 120 

Tyr Val Arg Glu Arg Ser Trp Ser Leu Ser Asn He 
1205 1210 1215 

Ser Pro Ser He Met Phe Thr Met Asp He Lys Tyr 
1225 1230 

He Ala 



Arg Gly 



Ser Ser Thr Gin Glu Lys Lys Thr Met Pro Val Tyr Asn Arg Gin Val 
1265 1270 1275 1280 

Leu Thr Lys Lys Gin Arg Asp Gin lie Asp Leu Leu Ala Lys Leu Asp 
1285 1290 1295 
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Trp Val Tyr Ala Ser lie Asp Asn Lys Asp Glu Phe Met Glu Glu Leu 
1300 1305 1310 

Ser Thr Gly Thr Leu Gly Leu Ser Tyr Glu Lys Ala Lys Lys Leu Phe 
1315 1320 1325 

Pro Gin Tyr Leu Ser Val Asn Tyr Leu His Arg Leu Thr Val Ser Ser 
1330 1335 1340 

Arg Pro Cys Glu Phe Pro Ala Ser He Pro Ala Tyr Arg Thr Thr Asn 
1345 1350 1355 1360 

Tyr His Phe Asp Thr Ser Pro He Asn His Val Leu Thr Glu Lys Tyr 
1365 1370 1375 

Gly Asp Glu Asp He Asp He Val Phe Gin Asn Cys He Ser Phe Gly 
1380 1385 1390 

Leu Ser Leu Met Ser Val Val Glu Gin Phe Thr Asn He Cys Pro Asn 
1395 1400 1405 

Arg He He Leu He Pro Lys Leu Asn Glu lie His Leu Met Lys Pro 
1410 1415 1420 

Pro He Phe Thr Gly Asp Val Asp He He Lys Leu Lys Gin Val He 
1425 1430 1435 1440 

Gin Lys Gin His Met Phe Leu Pro Asp Lys He Ser Leu Thr Gin Tyr 
1445 1450 1455 

Val Glu Leu Phe Leu Ser Asn Lys Ala Leu Lys Ser Gly Ser His He 
1460 1465 1470 

Asn Ser Asn Leu He Leu Val His Lys Met Ser Asp Tyr Phe His Asn 
1475 1480 1485 

Ala Tyr He Leu Ser Thr Asn Leu Ala Gly His Trp He Leu He He 
1490 1495 1500 

Gin Leu Met Lys Asp Ser Lys Gly He Phe Glu Lys Asp Trp Gly Glu 
1505 1510 1515 1520 

Gly Tyr He Thr Asp His Met Phe He Asn Leu Asn Val Phe Phe Asn 
1525 1530 1535 

Ala Tyr Lys Thr Tyr Leu Leu Cys Phe His Lys Gly Tyr Gly Lys Ala 
1540 1545 1550 

Lys Leu Glu Cys Asp Met Asn Thr Ser Asp Leu Leu Cys Val Leu Glu 
1555 1560 1565 
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Leu He Asp Ser Ser Tyr Trp Lys Ser Met Ser Lys Val Phe Leu Glu 
1570 1575 1580 

Gin Lys Val He Lys Tyr He Val Asn Gin Asp Thr Ser Leu Arg Arg 
1585 1590 1595 1600 

He Lys Gly Cys His Ser Phe Lys Leu Trp Phe Leu Lys Arg Leu Asn 
1605 1610 1615 

Asn Ala Lys Phe Thr Val Cys Pro Trp Val Val Asn He Asp Tyr His 
1620 1625 1630 

Pro Thr His Met Lys Ala He Leu Ser Tyr He Asp Leu Val Arg Met 
1635 1640 1645 

Gly Leu He Asn Val Asp Lys Leu Thr He Lys Asn Lys Asn Lys Phe 
1650 1655 1660 

Asn Asp Glu Phe Tyr Thr Ser Asn Leu Phe Tyr He Ser Tyr Asn Phe 
1665 1670 1675 1680 

Ser Asp Asn Thr His Leu Leu Thr Lys Gin He Arg He Ala Asn Ser 
1685 1690 1695 

Glu Lou Glu Asp Asn Tyr Asn Lys Leu Tyr His Pro Thr Pro Glu Thr 
1700 1705 1710 

Leu Glu Asn Met Ser Leu He Pro Val Lys Ser Asn Asn Ser Asn Lys 
1715 1720 1725 

Pro Lys Phe Cys He Ser Gly Asn Thr Glu Ser Met Met Met Ser Thr 
1730 1735 1740 

Phe Ser Ser Lys Met His He Lys Ser Ser Thr Val Thr Thr Arg Phe 
1745 1750 1755 1760 

Asn Tyr Ser Lys Gin Asp Leu Tyr Asn Leu Phe Pro He Val Val He 
1765 1770 1775 

Asp Lys He He Asp His Ser Gly Asn Thr Ala Lys Ser Asn Gin Leu 
1780 1785 1790 

Tyr Thr Thr Thr Ser His Gin Thr Ser Leu Val Arg Asn Ser Ala Ser 
1795 1800 1805 

Leu Tyr Cys Met Leu Pro Trp His His Val Asn Arg Phe Asn Phe Val 
1810 1815 1820 

Phe Ser Ser Thr Gly Cys Lys He Ser He Glu Tyr He Leu Lys Asp 
1825 1830 1835 1840 

Leu Lys He Lys Asp Pro Ser Cys He Ala Phe He Gly Glu Gly Ala 
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1845 1850 1855 

Qly Asn Leu Leu Leu Arg Thr Val Val Glu Leu Hie Pro Asp lie Arg 
1860 1865 1870 

Tyr He Tyr Arg Ser Leu Lys Asp Cys Asn Asp His Ser Leu Pro He 
1875 1880 1885 

Glu Phe Leu Arg Leu Tyr Asn Gly His He Asn He Asp Tyr Gly Glu 
1890 1895 1900 

Asn Leu Thr He Pro Ala Thr Asp Ala Thr Asn Asn He His Trp Ser 
1905 1910 1915 1920 

Tyr Leu Hie He Lys Phe Ala Glu Pro He Ser He Phe Val Cys Asp 
1925 1930 1935 

Ala Glu Leu Pro Val Thr Ala Asn Trp Ser Lys He He He Glu Trp 
1940 1945 1950 

Ser Lys His Val Arg Lys Cys Lys Tyr Cys Ser Ser Val Asn Arg Cys 
1955 1960 1965 

lie Leu He Ala Lys Tyr His Ala Gin Asp Asp He Asp Phe Lys Leu 
1970 1975 1980 

Asp Asn He Thr He Leu Lys Thr Tyr Val Cys Leu Gly Ser Lys Leu 
1985 1990 1995 2000 

Lys Gly Ser Glu Val Tyr Leu He Leu Thr He Gly Pro Ala Asn He 
2005 2010 2015 

Leu Pro Val Phe Asp Val Val Gin Asn Ala Lys Leu Thr Leu Ser Arg 
2020 2025 2030 

Thr Lys Asn Phe He Met Pro Lys Lys Thr Asp Lys Glu Ser He Asp 
2035 2040 2045 

Ala Asn He Lys Ser Leu He Pro Phe Leu Cys Tyr Pro He Thr Lys 
2050 2055 2060 

Lys Gly He Lys Thr Ser Leu Ser Lys Leu Lys Ser Val Val Asn Gly 
2065 2070 2075 2080 

Asp lie Leu Ser Tyr Ser lie Ala Gly Arg Asn Glu Val Phe Ser Asn 
2085 2090 2095 

Lys Leu He Asn His Lys His Met Asn He Leu Lys Trp Leu Asp His 
2100 2105 2110 

Val Leu Asn Phe Arg Ser Ala Glu Leu Asn Tyr Asn His Leu Tyr Met 
2115 2120 2125 
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lie Glu Ser Thr Tyr Pro Tyr Leu Ser Glu Leu Leu Asn Ser Leu Thr 
2130 2135 2140 



Thr Asn Glu Leu Lys Lys Leu lie Lys lie Thr Gly Ser Val Leu Tyr 
2145 2150 2155 2160 



Asn Leu Pro Asn Glu Gin 
2165 



(2) INFORMATION FOR SEQ ID NO: 25: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15229 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

<ii) MOLECULE TYPE: RNA (genomic) 

<xi) SEQUENCE DESCRIPTION: SEQ ID NO:25: 

ACGCGAAAAA ATGCGTACTA CAAACTTGCA CATTCGGAAA AAATGGGGCA AATAAGAATT 60 

TGATAAGTGC TATTTAAATC TAACCTTTTC AATCAGAAAT GGGGTGCAAT TCACTGAGCA 120 

TGATAAAGGT TAGATTACAA AATTTATTTG ACAATGACGA AGTAGCATTG TTAAAAATAA 180 

CATGTTATAC TGACAAATTA ATTCTTCTGA CCAATGCATT AGCCAAAGCA GTAATACATA 240 

CAATTAAATT AAACGGCATA GTTTTTATAC ATGTTATAAC AAGCAGTGAA GTGTGCCCTG 300 

ACAACAATAT TGTAGTGAAA TCTAACTTTA CAACAATGCC AATATTACAA AACGGAGGAT 360 

ACATATGGGA ATTGATTGAG TTGACACACT GCTCTCAATC AAATGGTCTA ATGGATGATA 420 

ATTGTGAAAT CAAATTTTCT AAAAGACTAA GTGACTCAGT AATGACTAAT TATATGAATC 480 

AAATATCTGA TTTACTTGGG CTTGATCTCA ATTCATGAAT TATGTTTAGT CTAATTTAAT 540 

AGACATGTGT TTATCACCAT TTTAGTTAAT ATAAAACCTC ATCAAAGGGA AATGGGGCAA 600 

ATAAACTCAC CTAATCAGTC AAACCATGAG CACTACAAAT GACAACACTA CTATGCAAAG 660 

ATTGATGATC ACAGACATGA GACCCCTGTC GATGGAATCA ATAATAACAT CTCTCACCAA 720 

AGAAATCATA ACACACAAAT TCATATACTT GATAAACAAT GAATGTATTG TAAGAAAACT 780 

TGATGAAAGA CAAGCTACAT TTACATTCTT AGTCAATTAT GAGATGAAGC TATTGCACAA 840 
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AGTAGGGAGT ACCAAATACA AGAAATACAC TGAATATAAT ACAAAATATG GCACTTTCCC 
CATGCCTATA TTTATCAATC ATGACGGGTT TCTAGAATGT ATTGGCATTA AGCCTACAAA 
ACACACTCCT ATAATATACA AATATGACCT CAACCCGTAA ATTCCAACAA AAAACTAACC 
CATCCAAACT AAGCTATTCC TCAAACAACA GTGCTCAACA GTTAAGAAGG AGCTAATCCA 
TTTTAGTAAT TAAAAATAAA GGCAGAGCCA ATAACATAAA TTGGGGCAAA TACAAAGATG 
GCTCTTAGCA AAGTCAAGTT AAATGATACA TTAAATAAGG ATCAGCTGCT GTCATCCAGC 
AAATACACTA TTCAACGTAG TACAGGAGAT AATATTGAGA CTCCCAATTA TGATGTGCAA 
AAACACCTAA ACAAACTATG TGGTATGCTA TTAATCACTG AAGATG CAAA TCATAAATTC 
ACAGGATTAA TAGGTATGTT ATATGCTATG TCCAGGTTAG GAAGGGAAGA CACTATAAAG 
ATACTTAAAG ATGCTGGATA TCATGTTAAA GCTAATGGAG TAGATATAAC AACATATCGT 
CAAGATATAA ACGGAAAGGA AATGAAATTC GAAGTATTAA CATTATCAAG CTTGACATCA 
GAAATACAAG TCAATATTGA GATAGAATCT AGAAAGTCCT ACAAAAAAAT GCTAAAAGAG 
ATGGGAGAAG TGGCTCCAGA ATATAGGCAT GATTCTCCAG ACTGTGGGAT GATAATACTG 
TGTATAGCTG CACTTGTAAT AACCAAGTTA GCAGCAGGAG ATAGATCAGG TCTTACAGCA 
GTAATTAGGA GGGCAAACAA TGTCTTAAAA AACGAAATAA AACGCTACAA GGGCCTCATA 
CCAAAGGATA TAGCTAACAG TTTTTATGAA GTGTTTGAAA AACACCCTCA TCTTATAGAT 
GTTTTTGTGC ACTTTGGCAT TGCACAATCA TCCACAAGAG GGGGTAGTAG AGTTGAAGGA 
ATCTTTGCAG GATTATTTAT GAATGCCTAT GGTTCAGGGC AAGTAATGCT AAGATGGGGA 
GTTCTAGCCA AATCTGTAAA AAATATCATG CTAGGACATG CTAGTGTCCA GGCAGAAATG 
GAACAAGTTG TGGAAGTTTA TGAGTATGCA CAGAAGTTGG GAGGAGAAGC TGGATTCTAC 
CATATATTGA ACAATCCAAA AGCAT CATTG CTGTCATTAA CTCAATTTCC TAACTTCTCA 
AGTGTGGTCC TAGGCAATGC AGCAGGTCTA GGCATAATGG GAGAGTATAG AGGTACACCA 
AGAAACCAAG ATCTATATGA TGCAGCCAAA GCATATGCAG AGCAACTCAA AGAAAATGGA 
GTAATAAACT ACAGTGTATT AGACTTAACA GCAGAAGAAT TGGAAGCCAT AAAGCATCAA 
CTCAACCCCA AAGAAGATGA TGTAGAGCTT TAAGTTAACA AAAAATACGG GGCAAATAAG 
TCAACATGGA GAAGTTTGCA CCTGAATTTC ATGGAGAAGA TGCAAACAAC AAAGCTACCA 
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AATTCCTAGA ATCAATAAAG GGCAAGTTTG CATCATCCAA AGATCCTAAG AAGAAAGATA 
GCATAATATC TGTTAACTCA ATAGATATAG AAGTAACTAA AGAGAGCCCG ATAACATCTG 
GCACCAACAT CATCAATCCA ATAAGTGAAG CTGATAGTAC CCCAGAAGCT AAAGCCAACT 
ACCCAAGAAA ACCCCTAGTA AGCTTCAAAG AAGATCTCAC CCCAAGTGAC AACCCCTTTT 
CTAAGTTGTA CAAAGAAACA ATAGAAACAT TTGATAACAA TGAAGAAGAA TCTAGCTACT 
CATATGAAGA AATAAATGAT CAAACAAATG ACAACATTAC AGCAAGACTA GATAGAATTG 
ATGAAAAATT AAGTGAAATA TTAGGAATGC TCCATACATT AGTAGTTGCA AGTGCAGGAC 
CCACCTCAGC TCGCGATGGA ATAAGAGATG CTATGGTTGG TCTAAGAGAA GAAATGATAG 
AAAAAATAAG AGCGGAAGCA TTAATGACCA ATGATAGGTT AGAGGCTATG GCAAGACTTA 
GGAATGAGGA AAGCGAAAAA ATGGCAAAAG ACACCTCAGA TGAAGTGTCT CTTAATCCAA 
CTTCCAAAAA ATTGAGTAAT TTGTTGGAAG ACAACGATAG TGACAATGAT CTATCACTTG 
ATGATTTTTG ATCAGTGATC AACTCACTCA GCAATCAACA AC AT CAATG A AACAGACATC 
AATCCATTGA ATCAACTGCC AGACTGAACA CACAAACGTC CATCAGCAGA ACTACCAACC 
AATCAATCAA CCAATTGATC AATCAGCGAC CTAACAAAAT TAACAATATA GTAACAAAAA 
AAGAACAAGA TGGGGCAAAT ATGGAAACAT ACGTGAACAA GCTTCACGAG GGCTCCACAT 
ACACAGCAGC TGTTCAGTAC AATGTTCTAG AAAAAGATGA TGATCCTGCA TCACTAACAA 
TATGGGTGCC TATGTTCCAG TCATCTGTGC CAGCAGACTT GCTCATAAAA GAACTTGCAA 
GCATCAACAT ACTAGTGAAG CAGATCTCCA CGCCCAAAGG ACCTTCACTA CGAGTCACGA 
TTAACTCAAG AAGTGCTGTG CTGGCACAAA TGCCTAGTAG TTTTATCATA AGTGCAAATG 
TATCATTAGA TGAAAGAAGC AAATTAGCAT ATGATGTAAC TACACCTTGT GAAATCAAAG 
CATGCAGTCT AACATGCTTA AAAGTAAAAA GTATGTTAAC TACAGTCAAA GATCTTACCA 
TGAAAACATT CAATCCCACT CATGAGATTA TTGCTCTATG TGAATTTGAA AATATTATGA 
CATCAAAAAG AGTAATAATA CCAACCTATC TAAGATCAAT TAGTGTCAAA AACAAGGACC 
TGAACTCACT AGAAAATATA GCAACCACCG AATTCAAAAA TGCTATCACC AATGCGAAAA 
TTATTCCCTA TGCAGGATTA GTATTAGTTA TCACAGTTAC TGACAATAAA GGAGCATTCA 
AATATATCAA GCCACAGAGT CAATTTATAG TAGATCTTGG GGCCTACCTA GAAAAAGAGA 
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GCATATATTA TGTGACTACA AATTG6AA6C ATACAGCTAC ACGTTTTTCA ATCAAACCAC 
TAGAGGATTA AACTTAATTA TCAACACTAA ATGACAGGTC CACATATATC TTCAAACTAT 
ACATTATATC CAAACATCAT GAGCATTTAC ACTACACACT TTTACCATAT AAATCAATCT 
CATTTAAAAT CCAAAATTAC TTCCAGCTAT CATCTGTTAG ACCTAGAGTG CGAATAGGTA 
AATAAAACCA AAATATGGGG TAAATAGACA TTAGTTAGAG TTCAATCAAT CTCAACAACC 
ATTTATACCG CCAATTCAGT ACATATACTA TAAATCTCAA AATGGGAAAT ACATCCATCA 
CAATAGAATT CACAAGCAAA TTTTGGCCTT ATTTTACACT AATACATATG ATCTTAACTC 
TAATCTCTTT ACTAATTATA ATCACTATTA TGATTGCAAT ACTAAATAAG CTAAGTGAAC 
ATAAAACATT CTGCAACAAA ACTCTTGAAC TAGGACAGAT GTATCAAATC AACACATAGT 
GTTCTACCAT TATGCTGTGT CAAATTATAA TCTTGTATAT ATAAACAAAC AAATC CAATC 
TTCTCACAGA GTCATGGTGG CGCAAAACCA CGCCAACCAT CATGATAGCA TAGAGTAGTT 
ATTTAAAAAT TAACATAATG ATGAATTATT GGTATGAGAT CAGGAACAAC ATTGGGGCAA 
ATGCAGCCAT GTCCAAGCAC AAGAATCGGC GCACTGCCGG GACTCTAGAA AGGACCTGGG 
ATACTCTTAA TCATCTAATT GTAATATCCT CTTGTTTATA CAGATTAAAT TTAAAATCTA 
TAGCACAAAT AGCACTGTCA GTTTTGGCAA TGATAATCTC AACCTCTCTC ATAATTGCAG 
CCATAATATT CATCATCTCT GCCAATCACA AAGTTACACT AACAACGGTT ACAGTTCAAA 
CAATAAAAAA CCACACTGAA AAAAACATCT CCACCTACCT TACTCAAGTC CCACCAGAAA 
GGGTCAACTC ATCCAAACAA CCCACAACCA CATCACCAAT CCACACAAAT TCAGCCACAA 
TATCACCAAA TACAAAATCA GAAACACACC ATACAACAGC ACAAACCAAA GGCAGAATCA 
CCACTTCAAC ACAGACCAAC AAGCCAAGCA CAAAATCACG TTCAAAAAAT CCACCAAAAA 
AACCAAAAGA TGATTACCAT TTTGAAGTGT TCAATTTTGT TCCCTGTAGT ATATGTGGTA 
ATAATCAACT CTGCAAATCC ATCTGCAAAA CAATACCAAG CAACAAACCA AAGAAAAAAC 
CAACCATCAA ACCCACAAAC AAACCAACCA CCAAAACCAC AAACAAAAGA GACCCCAAAA 
CACCAGCCAA AATGCCAAAA AAAGAAATCA TCACCAACCC AGCAAAAAAA CCAACCCTCA 
AGACCACAGA AAGAGACACC AGCATTTCAC AATCCACCGT GCTCGACACA ATCACTCCAA 
AATACACAAT CCAACAGCAA TCCCTCCACT CAACCACCTC CGAAAACACA CCCAGCTCCA 
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CACAAATACC CACAGCATCC GAGCCCTCCA CATTAAATCC TAATTAAAAA ACCTAGTCAC 
ATGCTTAGTT ATTCAAAAAC TACATCTTAG CAGAGAACCG TGATCTATCA AGCAAGAACA 
AAATTAAACC TGGGGCAAAT AACCATGGAG TTGCTGATCC ACAGGTCAAG TGCAATCTTC 
CTAACTCTTG CTGTTAATGC ATTGTACCTC ACCTCAAGTC AGAACATAAC TGAGGAGTTT 
TACCAATCGA CATGTAGTGC AGTTAGCAGA GGTTATTTTA GTGCTTTAAG AACAGGTTGG 
TATACCAGTG TCATAACAAT AGAATTAAGT AATATAAAAG AAACCAAATG CAATGGAACT 
GACACTAAAG TAAAACTTAT AAAACAAGAA TTAGATAAGT ATAAGAATGC AGTAACAGAA 
TTACAGCTAC TTATGCAAAA CACGCCAGCT GCCAACAACC GGGCCAGAAG AGAAGCACCA 
CAGTACATGA ACTACACAAT CAATACCACA AAAAACCTAA ATGTATCAAT AAGCAAGAAA 
AGGAAACGAA GATTTCTGGG CTTCTTGTTA GGTGTAGGAT CTGCAATAGC AAGTGGTATA 
GCTGTATCCA AAGTTTTACA CCTTGAAGGA GAAGTGAACA AAATCAAAAA TGCTTTGTTG 
TCTACAAACA AAGCTGTAGT CAGTCTATCA AATGGGGTCA GTGTTTTAAC CAGCAAAGTG 
TTAGATCTCA AGAATTACAT AAATAACCGA ATATTACCCA TAGTAAATCA ACAGAGCTGT 
CGCATCTCCA ACATTGAAAC AGTTATAGAA TTCCAGCAGA AGAATAGCAG ATTGTTGGAA 
ATCAC CAGAG AATTTAGTGT TAATGCAGGT GTAACAACAC CTTTAAGCAC TTACATGTTA 
ACAAACAGTG AGTTACTATC ATTGATCAAT GATATGCCTA TAACAAATGA CCAGAAAAAA 
TTAATGTCAA GCAATGTTCA GATAGTAAGG CAACAAAGTT ATTCTATCAT GTCTATAATA 
AAGGAAGAAG TCCTTGCATA TGTTGTACAG CTACCTATCT ATGGTGTAAT AGATACACCT 
TGCTGGAAAT TACACACATC ACCTCTATGC ACCACCAACA TCAAAGAAGG ATCAAATATT 
TGTTTAACAA GGACTGATAG AGGATGGTAT TGTGATAATG CAGGATCAGT ATCCTTCTTC 
CCACAGGCTG ATACTTGCAA AGTACAGTCC AATCGAGTAT TTTGTGACAC TATGAACAGT 
TTAACATTAC CAAGTGAAGT CAGCCTTTGT AACACTGACA TATTCAATTC CAAGTATGAC 
TGCAAAATTA TGACATCAAA AACAGACATA AGCAGCTCAG TAATTACTTC TCTTGGAGCT 
ATAGTGTCAT GCTATGGAAA AACTAAATGC ACTGCATCCA ATAAAAATCG TGGGATTATA 
AAGACATTTT CTAATGGTTG TGACTATGTG TCAAACAAAG GAGTAGATAC TGTGTCAGTG 
GGCAACACTT TATACTATGT AAACAAGCTG GAAGGCAAAA ACCTTTATGT AAAAGGGGAA 
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CCTATAATAA ATTACTATGA TCCTCTAGTG TTTCCTTCTG ATGAGTTTGA TGCATCAATA 
TCTCAAGTCA ATGAAAAAAT CAATCAAAGT TTAGCTTTTA TTCGTAGATC TGATGAATTA 
CTACATAATG TAAATACTGG CAAATCTACT ACAAATATTA TGATAACTAC AATTATTATA 
GTAATCATTG TAGTATTGTT ATCATTAATA GCTATTGGTT TACTGTTGTA TTGCAAAGCC 
AAAAACACAC CAGTTACACT AAGCAAAGAC CAACTAAGTG GAATCAATAA TATTGCATTC 
AGCAAATAGA CAAAAAACTA CTTAATCATG TTTCAACAAC AATCTGCTGA CCACCAATCC 
CAAATCAACT TAACAACAAA TATTTCAACA TCATAGCACA GGCTGAATCA TTTCCTCATA 
TCATGCTACC TACACAACTA AGCTAGATCT TCAACTCATA GTTACATAAA AACCCCAAGT 
ATCACAATCA AACACTAAAT CGACACATCA TTCACAAAAT TAACAACTGG GGCAAATATG 
TCGCGAAGAA ATCCTTGTAA ATTTGAGATT AGAGGTCATT GCTTGAATGG TAGAAGATGT 
CACTACAGTC ATAATTATTT TGAATGGCCT CCTCATGCAT TACTAGTGAG GCAAAACTTC 
ATGTTAAACA AGATACTTAA GTCAATGGAC AAAAGCATAG ACACTTTGTC GGAAATAAGT 
GGAGCTGCTG AACTGGATAG AACAGAAGAA TATGCTCTTG GTATAGTTGG AGTGCTAGAG 
AGTTACATAG GATCAATAAA CAACATAACA AAACAATCAG CATGTGTTGC TATGAGTAAA 
CTTCTTATTG AGATCAACAG TGATGACATT AAAAAACTGA GAGATAACGA AGAACCCAAT 
TCGCCTAAGA TAAGAGTGTA CAATACTGTT ATATCATACA TTGAGAGCAA TAGAAAAAAC 
AACAAGCAAA C CATC CAT CT GCTCAAAAGA CTACCAGCAG ACGTGCTGAA GAAGACAATA 
AAGAACACAT TAGATATCCA CAAAAGCATA ACCATAAGCA ACTCAAAAGA GTCAACCGTG 
AATGATCAAA ATGACCAAAC CAAAAATAAT GATATTACCG GATAAATATC CTTGTAGTAT 
ATCATCCATA TTGATTTCAA GTGAAAGCAT GATTGCTACA TTCAATCATA AAAACATATT 
ACAATTTAAC CATAACCATT TGGATAACCA CCAGTGTTTA TTAAATCATA TATTTGATGA 
AATTCATTGG ACACCTAAAA ACTTATTAGA TGCCACTCAA CAATTTCTCC AACATCTTAA 
CATCCCTGAA GATATATATA CAGTATATAT ATTAGTGTCA TAATGCTTGA CCATAACAAT 
TTTATATCAT TCAACCATAA AACAACCTTA ATAAGGTTAT GGGACAAAAT GGATCCCATT 
ATTAATGGAA ACTCTGCCAA TGTGTATCTA ACTGATAGTT ATCTAAAAGG TGTTATCTCT 
TTTTCAGAAT GTAATGCTTT AGGGAGTTAC CTTTTTAACG GCCCCTATCT TAAAAATGAT 
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TACACCAACT TAATTAGTAG ACAAAGCCCA CTACTAGAGC ATATGAATCT AAAAAAACTA 8700 

ACTATAACAC AGTCATTAAT ATCTAGATAT CATAAAGGTG AACTGAAGTT AGAAGAACCA 8760 

ACTTATTTCC AGTCATTACT TATGACATAT AAAAGTATGT CCTCGTCTGA ACAAATTGCT 8820 

ACAACTAATT TACTTAAAAA AATAATACGA AGAGCTATAG AAATAAGTGA TGTAAAGGTG 8880 

TACGCCATCT TGAATAAACT GGGACTAAAG GAAAAGGACA GAGTTAAGCC CAACAATAAT 8940 

TCAGGTGATG AAAACTCAGT TCTTACAACC ATAATCAAAG ATGATATACT TTCAGCTGTG 9000 

GAAAACAATC AATCATATAC AAATTCAGAC AAAAATCATT CAGTAAATCA AAAT AT CACT 9060 

ATCAAAACAA CACTCTTGAA AAAATTGATG TGTTCAATGC AACATCCTCC ATCATGGTTA 9120 

ATACACTGGT TCAATTTATA TACAAAATTA AATAACATAT TAACACAATA TCGATCAAAT 9180 

GAGGTAAAAA GTCATGGGTT TATATTAATA GATAATCAAA CTTTAAGTGA TTTTCAGTTT 9240 

ATTTTAAATC AATATGGTTG TATCGTTTAT CATAAAGGAC TCAAAAAAAT CACAACTACT 9300 

ACTTACAATC AATTTTTGAC ATGGAAAGAC ATCAGCCTTA GCAGATTAAA TGTTTGCTTA 9360 

ATTACTTGGA TAAGTAATTG TTTAAATACA TTAAATAAAA GCTTAGGGCT GAGATGTGGA 9420 

TTCAATAATG TTGTGTTATC ACAACTATTT CTTTATGGAG ATTGTATACT GAAATTATTC 9480 

CATAATGAAG GCTTCTACAT AATAAAAGAA GTAGAGGGAT TTATTATGTC TTTAATTCTA 9540 

AACATAACAG AAGAAGATCA ATTTAGGAAA CGATTTTATA ATAGCATGCT AAATAACATC 9600 

ACAGATGCAG CTATTAAGGC TCAAAAAAAC CTACTATCAA GAGTATGTCA CACTTTATTA 9660 

GACAAGACAG TGTCTGATAA TATCATAAAT GGTAAATGGA TAATCCTATT AAGTAAATTT 9720 

CTTAAATTGA TTAAGCTTGC AGGTGATAAT AATCTCAATA ACTTGAGTGA GCTTTATTTT 9780 

CTCTTCAGAA TCTTTGGACA TCCAATGGTC GATGAAAGAC AAGCAATGGA TGCTGTAAGA 9840 

ATTAACTGTA ATGAAACCAA GTTCTACTTA TTAAGTAATC TAAGTACGTT AAGAGGTGCT 9900 

TTCATTTATA GAATCATAAA GGGGTTTGTA AATACCTACA ACAGATGGCC CACTTTAAGG 9960 

AATGCTATTG TTCTACCTCT AAGATGGTTG AACTATTATA AACTTAATAC TTATCCATCT 10020 

CTACTTGAAA TCACAGAGAA AGATTTGATT ATTTTATCAG GATTGCGGTT CTATCGTGAG 10080 

TTTCATCTGC CTAAAAAAGT GGATCTTGAA ATGATAATAA ATGACAAAGC CATTTCACCT 10140 

C CAAAAG AT T TAATATGGAC TAGTTTTCCT AGAAATTACA TGCCATCACA TATACAAAAT 10200 



SUBSTITUTE SHEET (RULE 26) 





WO 98/13501 



PCT/US97/16718 



- 317 - 



TATATAGAAC ATGAAAAGTT GAAGTTCTCT GAAAGTGACA GATCAAGAAG AGTACTAGAG 
TATTACTTGA GAGATAATAA ATTCAATGAA TGCGATCTAT ACAATTGTGT GGTCAATCAA 
AGCTATCTCA ACAACTCTAA CCATGTGGTA TCACTAACTG GTAAAGAAAG AGAGCTCAGT 
GTAGGTAGAA TGTTTGCTAT GCAACCAGGT ATGTTTAGGC AAATTCAAAT CTTAGCAGAG 
AAAATGATAG CCGAAAATAT TTTACAATTC TTCCCTGAGA GTTTGACAAG ATATGGTGAT 
CTAGAGCTTC AAAAGATATT AGAATTAAAA GCAGGAATAA GCAACAAGTC AAATCGTTAT 
AATGATAACT ACAACAATTA TATCAGTAAA TGTTCTATCA TTACAGACCT TAGCAAATTC 
AATCAAGCAT TTAGATATGA AACATCATGT ATCTGCAGTG ATGTATTAGA TGAACTGCAT 
GGAGTACAAT CTCTGTTCTC TTGGTTGCAT TTAACAATAC CTCTTGTCAC AATAATATGT 
ACATATAGAC ATGCACCTCC TTTTATAAAG GATCATGTTG TTAATCTTAA TAAAGTTGAT 
GAACAAAGTG GATTATACAG ATATCATATG GGTGGTATTG AAGGCTGGTG TCAAAAACTG 
TGGACCATTG AAGCTATATC ATTATTAGAT CTAATATCTC TCAAAGGGAA ATTCTCTATC 
ACAGCTCTAA TAAATGGTGA TAATCAGTCA ATTGATATAA GTAAACCAGT TAGACTTATA 
GAGGGTCAGA CCCATGCTCA AGCAGATTAT TTGTTAGCAT TAAATAGCCT TAAATTGCTA 
TATAAAGAGT ATGCGGGCAT AGGCCACAAG CTCAAGGGAA CAGAGACCTA TATATCCCGA 
GATATGCAAT TCATGAGCAA AACAATCCAG CACAATGGAG TGTACTATCC AGCCAGTATC 
AAAAAAGTCC TGAGAGTAGG TCCATGGATA AATACAATAC TTGATGATTT TAAAGTTAGT 
TTAGAATCTA TAGGTAGCTT AACACAGGAG TTAGAATATA GAGGAGAGAG CTTATTATGC 
AGTTTAATAT TTAGGAACAT TTGGTTATAC AATCAAATTG CTTTGCAACT CCGAAATCAT 
GCATTATGTC ACAATAAGCT ATATTTAGAT ATATTGAAAG TATTAAAACA CTTAAAAACT 
TTTTTTAATC TTGATAGTAT TGATATGGCT TTAACATTGT ATATGAATTT GCCTATGCTG 
TTTGGTGGTG GTGATCCTAA TTTGTTATAT CGAAGCTTTT ATAGGAGAAC TCCAGACTTC 
CTTACAGAAG CTATAGTACA TTCAGTGTTT GTGTTGAGCT ATTATACTGG TCACGATTTA 
CAAGATAAGC TCCAGGATCT TCCAGATGAT AGACTGAACA AATTCTTGAC AT GT AT CATC 
ACGTTTGATA AAAATCCCAA TGCCGAGTTT GTAACATTGA TGAGAGATCC ACAGGCTTTA 
GGGTCTGAAA GGCAAGCAAA AATTACTAGT GAGATTAATA GATTAGCAGT GACAGAAGTC 
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TTAAGTATAG CTCCAAACAA AATATTTTCT AAAAGTGCAC AACATTATAC TACCACTGAG 
ATTGATCTAA ATGATATTAT GCAAAATATA GAACCAACTT ACCCTCATGG ATTAAGAGTT 
GTTTATGAAA GTTTACCTTT TTATAAAGCA GAAAAAATAG TTAATCTTAT ATCAGGAACA 
AAATCCATAA CTAATATACT TGAAAAAACA TCAGCAATAG ATTCAACTGA TATTAATAGG 
GCTACTGATA TGATGAGGAA AAATATAACT TTACTTATAA GGATACTTCC ACTAGATTGT 
AACAAAGACA AAAGAGAGTT ATTAAGTTTA GAAAATCTTA GTATAACTGA ATTAAGCAAG 
TATGTAAGAG AAAGATCTTG GTCGTTATCC AATATAGTAG GAGTAACATC GCCAAGTATT 
ATGTTCACAA TGGACATTAA ATATACAACT AGCACTATAG CCAGTGGTAT AATTATAGAA 
AAATATAATG TTAATAGTTT AACTCGTGGT GAAAGAGGAC CTACTAAGCC ATGGGTAGGT 
TCATCTACGC AGGAGAAAAA AACAATGCCA GTGTACAATA GACAAGTTTT AACCAAAAAG 
CAAAGAGACC AAATAGATTT ATTAGCAAAA TTAGACTGGG TATATGCATC CATAGACAAC 
AAAGATGAAT TCATGGAAGA ACTGAGTACT GGAACACTTG GACTGTCATA TGAGAAAGCC 
AAAAAATTGT TTCCACAATA TCTAAGTGTC AATTATTTAC ACCGCTTAAC AGTCAGTAGT 
AGACCATGTG AATTCCCTGC ATCAATACCA GCTTATAGAA CAACAAATTA TCATTTCGAT 
ACTAGTCCTA TCAACCATGT ATTAACAGAA AAGTATGGAG ATGAAGATAT CGACATTGTG 
TTTCAAAATT GCATAAGTTT TGGTCTTAGC TTAATGTCGG TTGTGGAACA ATTCACAAAC 
ATATGTCCTA ATAGAATTAT TCTCATACCG AAGCTGAATG AGATACATTT GATGAAACCT 
CCTATATTTA CAGGAGATGT TGATATCATC AAGTTGAAGC AAGTGATACA AAAACAGCAC 
ATGTTCCTAC CAGATAAAAT AAGTTTAACC CAATATGTAG AATTATTCCT AAGTAACAAA 
GCACTTAAAT CTGGATCTCA CATCAACTCT AATTTAATAT TAGTACATAA AATGTCTGAT 
TATTTTCATA ATGCTTATAT TTTAAGTACT AATTTAGCTG GACATTGGAT TCTGATTATT 
CAACTTATGA AGGATTCAAA AGGTATTTTT GAAAAAGATT GGGGAGAGGG GTATATAACT 
GATCATATGT TCATTAATTT GAATGTTTTC TTTAATGCTT ATAAGACTTA TTTGCTATGT 
TTTCATAAAG GTTATGGTAA AGCAAAATTA GAATGTGATA TGAACACTTC AGATCTTCTT 
TGTGTTTTGG AGCTAATAGA CAGTAGCTAC TGGAAATCTA TGTCTAAAGT TTTCCTAGAA 
CAAAAAGTCA TAAAATACAT AATCAATCAA GACACAAGTT TGCATAGAAT AAAAGGTTGT 
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CATAGTTTTA AGTTATGGTT TTTAAAACGC CTTAATAATG CTAAATTTAC CGTATGCCCT 
TGGGTTGTTA ACATAGATTA TCACCCAACA CACATGAAAG CTATATTATC TTACATAGAT 
TTAGTTAGAA TGGGGTTAAT AAATGTAGAT AAATTAACCA TTAAAAATAA AAATAAATTC 
AATGATGAAT TTTACACATC AAATCTCTTT TACATTAGTT ATAACTTTTC AGATAACACT 
CATTTGCTAA CAAAACAAAT AAGAATTGCT AATTCAGAAT TAGAAAATAA TTATAACAAA 
CTATATCACC CAACCCCAGA AACTTTAGAA AATATGTCAT TAATTCCTGT CAAAAGTAAT 
AATAGTAATA AACCTAAATT TGGTATAAGT GGAAATACCG AATCTATGAT GACGTCAACA 
TTCTCCAATA AAACGCATAT TAAATCTTCC GCTGTTATTA CAAGATTCAA TTATAGTAAA 
CAAGACTTGT ACAATTTATT TCCAATTGTC GTGATAGACA GGATTATAGA TCATTCAGGT 
AATACAGCAA AATCTAACCA ACTCTACACT ACCACTTCAC ATCAGACATC TTTAGTAAGG 
AATAGTGCAT CACTTTATTG CATGCTTCCT TGGCATCATG TCAATAGATT TAACTTTGTA 
TTTAGTTCCA CAGGATGCAA GATCAGTATA GAGTATATTT TAAAAGATCT TAAGATTAAA 
GACCCCAGTT GTATAGCATT CATAGGTGAA GGAGCTGGTA ACTTATTATT ACGTACAGTA 
GTAGAACTTC ATCCAGACAT AAGATACATT TACAGAAGTT TAAAAGATTG CAATGATCAT 
AGTTTACCTA TTGAATTTCT AAGGTTATAC AACGGGCATA TAAACATAGA TTATGGTGAG 
AATTTAACCA TTCCTGCTAC AGATGCAACT AATAACATTC ATTGGTCTTA TTTACATATA 
AAATTTGCAG AACCTATTAG CATTTTTGTC TGCGATGCTG AATTACCTGT TACAGCCAAT 
TGGAGTAAAA TTATAATTGA ATGGAGTAAG CATGTAAGAA AGTGCAAGTA CTGTTCCTCT 
GTAAATAGAT GCATTTTAAT TGCAAAATAT CATGCCCAAG ATGATATTGA TTTCAAATTA 
GATAACATTA CTATATTAAA AACTTACGTG TGCCTAGGTA GCAAGTTAAA AGGATCTGAA 
GTTTACTTAG TCCTTACAAT AGGCCCTGCA AATATACTTC CTGTTTTTAA TGTTGTGCAA 
AATGCTAAAT TGATTCTTTC AAGGACTAAA AATTTCATTA TGCCTAAAAA AACTGACAAA 
GAATCTATCG ATGCAAATAT TAAAAGCTTA ATACCTTTCC TTTGTTACCC TATAACAAAA 
AAAGGAATTA AGACTTCATT GTCAAAATTG AAGAGTGTAG TTAGTGGAGA TATATTATCA 
TATTCTATAG CTGGACGTAA TGAAGTATTC AGCAACAAGC TTATAAACCA CAAG CAT ATG 
AATATCCTAA AATGGCTAGA TCATGTTTTA AACTTTAGAT CAGCTGAACT TAATTACAAT 
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CATTTATATA TGATAGAGTC CACATATCCT TACTTAAGTG AATTGTTAAA CAGTTTAACA 14940 

ACCAATGAGC TCAAGAAGCT GATTAAAATA ACAGGTAGTG TACTATACAA CCTTCCCAAC 15000 

GAACAGTAAC TTAAAACATC ATTAACAAGT TTGATCAAAT TTAGATGCTA ACACATCATA 15060 

ATATTATAGT TATTAAAAAA TATATATGCA AACTTTTCAA TAATTTAGCA TATTGATTCC 15120 

AAAGTTATCA TTTTGGTCTT AAGGGGTTGA ATAAAAATCT AAAACTAACA ATTATACATG 15180 

TGCATTTACA ACACAACGAG ACATTAGTTT TTGACACTTT TTTTCTCGT 15229 
(2) INFORMATION FOR SEQ ID NO: 26: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2166 amino acids 

(B) TYPE: amino acid 

(C) STRAND EDNESS : 

(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26: 

Met Asp Pro He He Asn Gly Asn Ser Ala Asn Val Tyr Leu Thr Asp 
15 10 15 

Ser Tyr Leu Lys Gly Val He Ser Phe Ser Glu Cys Asn Ala Leu Gly 
20 25 30 

Ser Tyr Leu Phe Asn Gly Pro Tyr Leu Lys Asn Asp Tyr Thr Asn Leu 
35 40 45 

He Ser Arg Gin Ser Pro Leu Leu Glu His Met Asn Leu Lys Lys Leu 
50 55 60 

Thr He Thr Gin Ser Leu He Ser Arg Tyr His Lys Gly Glu Leu Lys 
65 70 75 80 

Leu Glu Glu Pro Thr Tyr Phe Gin Ser Leu Leu Met Thr Tyr Lys Ser 
85 90 95 

Met Ser Ser Ser Glu Gin He Ala Thr Thr Asn Leu Leu Lys Lys He 
100 105 110 

He Arg Arg Ala He Glu He Ser Asp Val Lys Val Tyr Ala He Leu 
115 120 125 
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Asn Lys Leu Gly Leu Lys Glu Lys Asp Arg Val Lys Pro Asn Asn Asn 
130 135 140 

Ser Gly Asp Glu Asn Ser Val Leu Thr Thr lie lie Lys Asp Asp lie 
145 150 155 160 

Leu Ser Ala Val Glu Asn Asn Gin Ser Tyr Thr Asn Ser Asp Lys Asn 
165 170 175 

His Ser Val Asn Gin Asn lie Thr lie Lys Thr Thr Leu Leu Lys Lys 
180 185 190 

Leu Met Cys Ser Met Gin His Pro Pro Ser Trp Leu lie His Trp Phe 
195 200 205 

Asn Leu Tyr Thr Lys Leu Asn Asn lie Leu Thr Gin Tyr Arg Ser Asn 
210 215 220 

Glu Val Lys Ser His Gly Phe He Leu He Asp Asn Gin Thr Leu Ser 
225 230 235 240 

Asp Phe Gin Phe He Leu Asn Gin Tyr Gly Cys He Val Tyr His Lys 
245 250 255 

Gly Leu Lys Lys He Thr Thr Thr Thr Tyr Asn Gin Phe Leu Thr Trp 
260 265 270 

Lys Asp He Ser Leu Ser Arg Leu Asn Val Cys Leu He Thr Trp He 
275 280 285 

Ser Asn Cys Leu Asn Thr Leu Asn Lys Ser Leu Gly Leu Arg Cys Gly 
290 295 300 

Phe Asn Asn Val Val Leu Ser Gin Leu Phe Leu Tyr Gly Asp Cys He 
305 310 315 320 

Leu Lys Leu Phe His Asn Glu Gly Phe Tyr He He Lys Glu Val Glu 
325 330 335 

Gly Phe He Met Ser Leu He Leu Asn He Thr Glu Glu Asp Gin Phe 
340 345 350 

Arg Lys Arg Phe Tyr Asn Ser Met Leu Asn Asn He Thr Asp Ala Ala 
355 360 365 

He Lys Ala Gin Lys Asn Leu Leu Ser Arg Val Cys His Thr Leu Leu 
370 375 380 

Asp Lys Thr Val Ser Asp Asn He He Asn Gly Lys Trp He He Leu 
385 390 395 400 

Leu Ser Lys Phe Leu Lys Leu He Lys Leu Ala Gly Asp Asn Asn Leu 
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405 410 415 

Asn Asn Leu Ser Glu Leu Tyr Phe Leu Phe Arg He Phe Gly His Pro 
420 425 430 

Met Val Asp Glu Arg Gin Ala Met Asp Ala Val Arg He Asn Cys Asn 
435 440 445 

Glu Thr Lys Phe Tyr Leu Leu Ser Asn Leu Ser Thr Leu Arg Gly Ala 
450 455 460 

Phe He Tyr Arg He He Lys Gly Phe Val Asn Thr Tyr Asn Arg Trp 
465 470 475 480 

Pro Thr Leu Arg Asn Ala He Val Leu Pro Leu Arg Trp Leu Asn Tyr 
485 490 495 

Tyr Lys Leu Asn Thr Tyr Pro Ser Leu Leu Glu He Thr Glu Lys Asp 
500 505 510 

Leu He He Leu Ser Gly Leu Arg Phe Tyr Arg Glu Phe His Leu Pro 
515 520 525 

Lys Lys Val Asp Leu Glu Met He He Asn Asp Lys Ala He Ser Pro 
530 535 540 

Pro Lys Asp Leu He Trp Thr Ser Phe Pro Arg Asn Tyr Met Pro Ser 
545 550 555 560 

His He Gin Asn Tyr He Glu His Glu Lys Leu Lys Phe Ser Glu Ser 
565 570 575 

Asp Arg Ser Arg Arg Val Leu Glu Tyr Tyr Leu Arg Asp Asn Lys Phe 
580 585 590 

Asn Glu Cys Asp Leu Tyr Asn Cys Val Val Asn Gin Ser Tyr Leu Asn 
595 600 605 

Asn Ser Asn His Val Val Ser Leu Thr Gly Lys Glu Arg Glu Leu Ser 
610 615 620 

Val Gly Arg Met Phe Ala Met Gin Pro Gly Met Phe Arg Gin He Gin 
625 630 635 640 

He Leu Ala Glu Lys Met He Ala Glu Asn He Leu Gin Phe Phe Pro 
645 650 655 

Glu Ser Leu Thr Arg Tyr Gly Asp Leu Glu Leu Gin Lys He Leu Glu 
660 665 670 

Leu Lys Ala Gly He Ser Asn Lys Ser Asn Arg Tyr Asn Asp Asn Tyr 
675 680 685 



SUBSTITUTE SHEET (RULE 26) 



WO 98/13501 PCT/US97/16718 



323 - 



Asn Asn Tyr lie Ser Lys Cys Ser lie lie Thr Asp Leu Ser Lys Phe 
690 695 700 

Asn Gin Ala Phe Arg Tyr Glu Thr Ser Cys He Cys Ser Asp Val Leu 
705 710 715 720 

Asp Glu Leu His Gly Val Gin Ser Leu Phe Ser Trp Leu His Leu Thr 
725 730 735 

He Pro Leu Val Thr He He Cys Thr Tyr Arg His Ala Pro Pro Phe 
740 745 750 

He Lys Asp His Val Val Asn Leu Asn Lys Val Asp Glu Gin Ser Gly 
755 760 765 

Leu Tyr Arg Tyr His Met Gly Gly He Glu Gly Trp Cys Gin Lys Leu 
770 775 780 

Trp Thr He Glu Ala He Ser Leu Leu Asp Leu He Ser Leu Lys Gly 
785 790 795 800 

Lys Phe Ser He Thr Ala Leu He Asn Gly Asp Asn Gin Ser He Asp 
805 810 815 

He Ser Lys Pro Val Arg Leu He Glu Gly Gin Thr His Ala Gin Ala 
820 825 830 

Asp Tyr Leu Leu Ala Leu Asn Ser Leu Lys Leu Leu Tyr Lys Glu Tyr 
835 840 845 

Ala Gly He Gly His Lys Leu Lys Gly Thr Glu Thr Tyr He Ser Arg 
850 855 860 

Asp Met Gin Phe Met Ser Lys Thr He Gin His Asn Gly Val Tyr Tyr 
865 870 875 880 

Pro Ala Ser He Lys Lys Val Leu Arg Val Gly Pro Trp He Asn Thr 
885 890 895 

He Leu Asp Asp Phe Lys Val Ser Leu Glu Ser He Gly Ser Leu Thr 
900 905 910 

Gin Glu Leu Glu Tyr Arg Gly Glu Ser Leu Leu Cys Ser Leu He Phe 
915 920 925 

Arg Asn He Trp Leu Tyr Asn Gin He Ala Leu Gin Leu Arg Asn His 
930 935 940 

Ala Leu Cys His Asn Lys Leu Tyr Leu Asp He Leu Lys Val Leu Lys 
945 950 955 960 
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His Leu Lys Thr Phe Phe Asn Leu Asp Ser He Asp Met Ala Leu Thr 
965 970 975 

Leu Tyr Met Asn Leu Pro Met Leu Phe Gly Gly Gly Asp Pro Asn Leu 
980 985 990 

Leu Tyr Arg Ser Phe Tyr Arg Arg Thr Pro Asp Phe Leu Thr Glu Ala 
995 1000 1005 

He Val His Ser Val Phe Val Leu Ser Tyr Tyr Thr Gly His Asp Leu 
1010 1015 1020 

Gin Asp Lys Leu Gin Asp Leu Pro Asp Asp Arg Leu Asn Lys Phe Leu 
1025 1030 1035 1040 

Thr Cys He He Thr Phe Asp Lys Asn Pro Asn Ala Glu Phe Val Thr 
1045 1050 1055 

Leu Met Arg Asp Pro Gin Ala Leu Gly Ser Glu Arg Gin Ala Lys He 
1060 1065 1070 

Thr Ser Glu He Asn Arg Leu Ala Val Thr Glu Val Leu Ser He Ala 
1075 1080 1085 

Pro Asn Lys He Phe Ser Lys Ser Ala Gin His Tyr Thr Thr Thr Glu 
1090 1095 1100 

He Asp Leu Asn Asp He Met Gin Asn He Glu Pro Thr Tyr Pro His 
1105 1110 1115 1120 

Gly Leu Arg Val Val Tyr Glu Ser Leu Pro Phe Tyr Lys Ala Glu Lys 
1125 1130 1135 

He Val Asn Leu He Ser Gly Thr Lys Ser He Thr Asn He Leu Glu 
1140 1145 1150 

Lys Thr Ser Ala He Asp Ser Thr Asp He Asn Arg Ala Thr Asp Met 
1155 1160 1165 

Met Arg Lys Asn He Thr Leu Leu He Arg He Leu Pro Leu Asp Cys 
1170 1175 1180 

Asn Lys Asp Lys Arg Glu Leu Leu Ser Leu Glu Asn Leu Ser He Thr 
1185 1190 1195 1200 

Glu Leu Ser Lys Tyr Val Arg Glu Arg Ser Trp Ser Leu Ser Asn He 
1205 1210 1215 

Val Gly Val Thr Ser Pro Ser He Met Phe Thr Met Asp He Lys Tyr 
1220 1225 1230 

Thr Thr Ser Thr He Ala Ser Gly He He He Glu Lys Tyr Asn Val 
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1235 1240 1245 

Asn Ser Leu Thr Arg Gly Glu Arg Gly Pro Thr Lys Pro Trp Val Gly 
1250 1255 1260 

Ser Ser Thr Gin Glu Lys Lys Thr Met Pro Val Tyr Asn Arg Gin Val 
1265 1270 1275 1280 

Leu Thr Lys Lys Gin Arg Asp Gin lie Asp Leu Leu Ala Lys Leu Asp 
1285 1290 1295 

Trp Val Tyr Ala Ser He Asp Asn Lys Asp Glu Phe Met Glu Glu Leu 
1300 1305 1310 

Ser Thr Gly Thr Leu Gly Leu Ser Tyr Glu Lys Ala Lys Lys Leu Phe 
1315 1320 1325 

Pro Gin Tyr Leu Ser Val Asn Tyr Leu His Arg Leu Thr Val Ser Ser 
1330 1335 1340 

Arg Pro Cys Glu Phe Pro Ala Ser He Pro Ala Tyr Arg Thr Thr Asn 
1345 1350 1355 1360 

Tyr His Phe Asp Thr Ser Pro He Asn His Val Leu Thr Glu Lys Tyr 
1365 1370 1375 

Gly Asp Glu Asp He Asp He Val Phe Gin Asn Cys He Ser Phe Gly 
1380 1385 1390 

Leu Ser Leu Met Ser Val Val Glu Gin Phe Thr Asn He Cys Pro Asn 
1395 1400 1405 

Arg He He Leu He Pro Lys Leu Asn Glu He His Leu Met Lys Pro 
1410 1415 1420 

Pro He Phe Thr Gly Asp Val Asp He He Lys Leu Lys Gin Val He 
1425 1430 1435 1440 

Gin Lys Gin His Met Phe Leu Pro Asp Lys He Ser Leu Thr Gin Tyr 
1445 1450 1455 

Val Glu Leu Phe Leu Ser Asn Lys Ala Leu Lys Ser Gly Ser His He 
1460 1465 1470 

Asn Ser Asn Leu He Leu Val His Lys Met Ser Asp Tyr Phe His Asn 
1475 1480 1485 

Ala Tyr He Leu Ser Thr Asn Leu Ala Gly His Trp He Leu He He 
1490 1495 1500 

Gin Leu Met Lys Asp Ser Lys Gly He Phe Glu Lys Asp Trp Gly Glu 
1505 1510 1515 1520 
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Gly Tyr He Thr Asp His Met Phe He Asn Leu Asn Val Phe Phe Asn 
1525 1530 1535 

Ala Tyr Lys Thr Tyr Leu Leu Cys Phe His Lys Gly Tyr Gly Lya Ala 
1540 1545 1550 

Lys Leu Glu Cys Asp Met Asn Thr Ser Asp Leu Leu Cys Val Leu Glu 
1555 1560 1565 

Leu He Asp Ser Ser Tyr Trp Lys Ser Met Ser Lys Val Phe Leu Glu 
1570 1575 1580 

Gin Lys Val He Lys Tyr He He Asn Gin Asp Thr Ser Leu His Arg 
1585 1590 1595 1600 

He Lys Gly Cys His Ser Phe Lys Leu Trp Phe Leu Lys Arg Leu Asn 
1605 1610 1615 

Asn Ala Lys Phe Thr Val Cys Pro Trp Val Val Asn He Asp Tyr His 
1620 1625 1630 

Pro Thr His Met Lys Ala He Leu Ser Tyr He Asp Leu Val Arg Met 
1635 1640 1645 

Gly Leu He Asn Val Asp Lys Leu Thr He Lys Asn Lys Asn Lys Phe 
1650 1655 1660 

Asn Asp Glu Phe Tyr Thr Ser Asn Leu Phe Tyr He Ser Tyr Asn Phe 
1665 1670 1675 1680 

Ser Asp Asn Thr His Leu Leu Thr Lys Gin He Arg He Ala Asn Ser 
1685 1690 1695 

Glu Leu Glu Asn Asn Tyr Asn Lys Leu Tyr His Pro Thr Pro Glu Thr 
1700 1705 1710 

Leu Glu Asn Met Ser Leu He Pro Val Lys Ser Asn Asn Ser Asn Lys 
1715 1720 1725 

Pro Lys Phe Gly He Ser Gly Asn Thr Glu Ser Met Met Thr Ser Thr 
1730 1735 1740 

Phe Ser Asn Lys Thr His He Lys Ser Ser Ala Val He Thr Arg Phe 
1745 1750 1755 1760 

Asn Tyr Ser Lys Gin Asp Leu Tyr Asn Leu Phe Pro He Val Val He 
1765 1770 1775 

Asp Arg He He Asp His Ser Gly Asn Thr Ala Lys Ser Asn Gin Leu 
1780 1785 1790 
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Tyr Thr Thr Thr Ser His Gin Thr Ser Leu Val Arg Asn Ser Ala Ser 
1795 1800 1805 



Leu Tyr Cys Met Leu Pro Trp His 
1810 1815 

Phe Ser Ser Thr Gly Cys Lys He 
1825 1830 

Leu Lys He Lys Asp Pro Ser Cys 
1845 

Gly Asn Leu Leu Leu Arg Thr Val 
1860 

Tyr He Tyr Arg Ser Leu Lys 
1875 

Glu Phe Leu Arg Leu Tyr Asn Gly 
1890 1895 

Asn Leu Thr He Pro Ala Thr Asp 
1905 1910 



His Val Asn Arg Phe Asn Phe Val 
1820 

Ser He Glu Tyr He Leu Lys Asp 
1835 1840 

He Ala Phe He Gly Glu Gly Ala 
1850 1855 

Val Glu Leu His Pro Asp He Arg 
1865 1870 

Ser Leu Pro He 
1885 

His He Asn He Asp Tyr Gly Glu 
1900 

Ala Thr Asn Asn He His Trp Ser 
1915 1920 



Asp Cys Asn Asp His 
1880 



Tyr Leu His He Lys Phe Ala Glu Pro He Ser He Phe Val Cys Asp 
1925 1930 1935 

Ala Glu Leu Pro Val Thr Ala Asn Trp Ser Lys He He He Glu Trp 
1940 1945 1950 

Ser Lys His Val Arg Lys Cys Lys Tyr Cys Ser Ser Val Asn Arg Cys 
1955 1960 1965 

He Leu He Ala Lys Tyr His Ala Gin Asp Asp He Asp Phe Lys Leu 
1970 1975 1980 

Asp Asn He Thr He Leu Lys Thr Tyr Val Cys Leu Gly Ser Lys Leu 
1985 1990 1995 2000 

Lys Gly Ser Glu Val Tyr Leu Val Leu Thr He Gly Pro Ala Asn He 
2005 2010 2015 

Leu Pro Val Phe Asn Val Val Gin Asn Ala Lys Leu He Leu Ser Arg 
2020 2025 2030 

Thr Lys Asn Phe He Met Pro Lys Lys Thr Asp Lys Glu Ser He Asp 
2035 2040 2045 

Ala Asn He Lys Ser Leu He Pro Phe Leu Cys Tyr Pro He Thr Lys 
2050 2055 2060 

Lys Gly He Lys Thr Ser Leu Ser Lys Leu Lys Ser Val Val Ser Gly 
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2065 



2070 



2075 



2080 



Asp He Leu Ser Tyr Ser He Ala Gly Arg Asn Glu Val Phe Ser Asn 
2085 2090 2095 



Lys Leu He Asn His Lys His Met Asn He Leu Lys Trp Leu Asp His 
2100 2105 2110 



Val Leu Asn Phe Arg Ser Ala Glu Leu Asn Tyr Asn His Leu Tyr Met 
2115 2120 2125 



He Glu Ser Thr Tyr Pro Tyr Leu Ser Glu Leu Leu Asn Ser Leu Thr 
2130 2135 2140 



Thr Asn Glu Leu Lys Lys Leu He Lys He Thr Gly Ser Val Leu Tyr 
2145 2150 2155 2160 



Asn Leu Pro Asn Glu Gin 
2165 

(2) INFORMATION FOR SEQ ID NO: 27: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15219 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

<ii) MOLECULE TYPE: RNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27: 

ACGGGAAAAA AATGCGTACT ACAAACTTGC ACATTCGAAA AAAATGGGGC AAATAAGAAC 60 

TTGATAAGTG CTATTTAAGT CTAACCTTTT CAATCAGAAA TGGGGTGCAA TTCACTGAGC 120 

ATGATAAAGG TTAGATTACA AAATTTATTT GACAATGACG AAGTAGCATT GTTAAAAATA 180 

ACATGTTATA CTGATAAATT AATTCTTCTG ACCAATGCAT TAGCCAAAGC AGCAATACAT 240 

ACAATTAAAT TAAACGGCAT AGTTTTTATA CATGTTATAA CAAGCAGTGA AGTGTGCCCT 300 

GATAACAATA TTGTAGTGAA ATCTAACTTT ACAACAATGC CAATACTACA AAATGGAGGA 360 

TACATATGGG AATTGATTGA GTTGACACAC TGCTCTCAAT TAAACGGTTT AATGGATGAT 420 

AATTGTGAAA TCAAATTTTC TAAAAGACTA AGTGACTCAG TAATGACTAA TTATATGAAT 480 

CAAATATCTG ACTTACTTGG GCTTGATCTC AATTCATGAA TTATGTTTAG TCTAATTCAA 540 
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TAGACATGTG TTTATTACCA TTTTAGTTAA TATAAAAACT CATCAAAGGG AAATGGGGCA 
AATAAACTCA CCTAATCAAT CAAACCATGA GCACTACAAA TGACAACACT ACTATGCAAA 
GATTGATGAT CACAGACATG AGACCCCTGT CAATGGATTC AATAATAACA TCTCTTACCA 
AAGAAATCAT CACACACAAA TTCATATACT TGATAAACAA TGAATGTATT GTAAGAAAAC 
TTGATGAAAG ACAAGCTACA TTTACATTCT TAGTCAATTA TGAGATGAAG CTACTGCACA 
AAGTAGGGAG TACCAAATAC AAAAAATACA CTGAATATAA TACAAAATAT GGCACTTTCC 
CCATGCCTAT ATTTATCAAT CACGGCGGGT TTCTAGAATG TATTGGCATT AAGCCTACAA 
AACACACTCC TATAATATAC AAATATGACC TCAACCCGTG AATTCCAACA AAAAAACCAA 
CCCAACCAAA CCAAACTATT CCTCAAACAA CAGTGCTCAA TAGTTAAGAA GGAGCTAATC 
CATTTTAGTA ATTAAAAATA AAAGTAAAGC CAATAACATA AATTGGGGCA AATACAAAGA 
TGGCTCTTAG CAAAGTCAAG TTGAATGATA CATTAAATAA GGATCAGCTG CTGTCATCCA 
GCAAATACAC TATTCAACGT AGTACAGGAG ATAATATTGA CACTCCCAAT TATGATGTGC 
AAAAACACCT AAACAAACTA TGTGGTATGC TATTAATCAC TGAAGATGCA AATCATAAAT 
TCACAGGATT AATAGGTATG TTATATGCTA TGTCCAGGTT AGGAAGGGAA GACACTATAA 
AGATACTTAA AGATGCTGGA TATCATGTTA AAGCTAATGG AGTAGATATA ACAACATATC 
GTCAAGATAT AAATGGAAAG GAAATGAAAT TCGAAGTATT AACATTATCA AGCTTGACAT 
CAGAAATACA AGTCAATATT GAGATAGAAT CTAGAAAGTC CTACAAAAAA ATGCTAAAAG 
AGATGGGAGA AGTGGCTCCA GAATATAGGC ATGATTCTCC AGACTGTGGG ATGATAATAC 
TGTGTATAGC TGCACTTGTG ATAACCAAAT TAGCAGCAGG AGACAGATCA GGTCTTACAG 
CAGTAATTAG GAGGGCAAAC AATGTCTTAA AAAACGAAAT AAAACGATAC AAGGGCCTCA 
TACCAAAGGA TATAGCTAAC AGTTTTTATG AAGTGTTTGA AAAACACCCT CATCTTATAG 
ATGTTTTCGT GCACTTTGGC ATTGCACAAT CATCCACAAG AGGGGGTAGT AGAGTTGAAG 
GAATCTTTGC AGGATTGTTT ATGAATGCCT ATGGTTCAGG GCAAGTAATG CTAAGATGGG 
GAGTTTTAGC CAAATCTGTA AAAAATATCA TGCTAGGACA TGCTAGTGTC CAGGCAGAAA 
TGGAGCAAGT TGTGGAAGTC TATGAGTATG CACAGAAGTT GGGAGGAGAA GCTGGATTCT 
ACCATATATT GAACAATCCA AAAGCATCAT TGCTGTCATT AACTCAATTT CCCAACTTCT 



600 
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CAAGT6T6GT CCTAGGCAAT GCAGCAG6TC TAGGCATAAT GGGAGAGTAT AGAGGTACAC 
CAAGAAACCA GGATCTTTAT GATGCAGCTA AAGCATATGC AGAGCAACTC AAAGAAAATG 
GAGTAATAAA CTACAGTGTA TTAGACTTAA CAGCAGAAGA ATTGGAAGCC ATAAAGCATC 
AACTCAACCC CAAAGAAGAT GATGTAGAGC TTTAAGTTAA CAAAAAATAC GGGGCAAATA 
AGTCAACATG GAGAAGTTTG CACCTGAATT TCATGGAGAA GATGCAAATA ACAAAGCTAC 
CAAATTCCTA GAATCAATAA AGGGCAAGTT CGCATCATCC AAAGATCCTA AGAAGAAAGA 
TAGCATAATA TCTGTTAACT CAATAGATAT AGAAGTAACT AAAGAGAGCC CGATAACATC 
TGGCACCAAC ATCATCAATC CAACAAGTGA AGCCGACAGT ACCCCAGAAA CAAAAGCCAA 
CTACCCAAGA AAACCCCTAG TAAGCTTCAA AGAAGATCTC ACCCCAAGTG ACAACCCTTT 
TTCTAAGTTG TACAAGGAAA CAATAGAAAC ATTTGATAAC AATGAAGAAG AATCTAGCTA 
CTCATATGAA GAGATAAATG ATCAAACAAA TGACAACATT ACAGCAAGAC TAGATAGAAT 
TGATGAAAAA TTAAGTGAAA TATTAGGAAT GCTCCATACA TTAGTAGTTG CAAGTGCAGG 
ACCCACTTCA GCTCGCGATG GAATAAGAGA TGCTATGGTT GGTCTAAGAG AAGAGATGAT 
AGAAAAAATA AGAGCGGAAG CATTAATGAC CAATGATAGG TTAGAGGCTA TGGCAAGACT 
TAGGAATGAG GAAAGCGAAA AAATGGCAAA AGACACCTCA GATGAAGTGT CTCTTAATCC 
AACTTCCAAA AAATTGAGTG ACTTGTTGGA AGACAACGAT AGTGACAATG ATCTATCACT 
TGATGATTTT TGATCAGCGA TCAACTCACT CAGCAATCAA CAACATCAAT AAAACAGACA 
TCAATCCATT GAATCAACTG CCAGACCGAA CAAACAAACG TCCATCAGTA GAACCACCAA 
CCAAT CAATC AACCAATTGA TCAAT CAGCA ACCCGACAAA ATTAACAATA TAGTAACAAA 
AAAAGAACAA GATGGGGCAA ATATGGAAAC ATACGTGAAC AAGCTTCACG AAGGCTCCAC 
ATACACAGCA GCTGTTCAGT ACAATGTTCT AGAAAAAGAT GATGATCCTG CATCACTAAC 
AATATGGGTG CCTATGTTCC AGTCATCTGT GCCAGCAGAC TTGCTCATAA AAGAACTTGC 
AAGCATCAAT ATACTAGTGA AGCAGATCTC TACGCCCAAA GGACCTTCAC TACGAGTCAC 
GATTAACTCA AGAAGTGCTG TGCTGGCTCA AATGCCTAGT AATTTCATCA TAAGCGCAAA 
TGTATCATTA GATGAAAGAA GCAAATTAGC ATATGATGTA ACTACACCTT GTGAAATCAA 
AGCATGCAGT CTAACATGCT TAAAAGTAAA AAGTATGTTA ACTACAGTCA AAGATCTTAC 



2160 
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CATGAAGACA TTCAACCCCA CTCATGAGAT CATTGCTCTA TGTGAATTTG AAAATATTAT 
GACATCAAAA AGAGTAATAA TACCAACCTA TCTAAGATCA ATTAGTGTCA AGAACAAGGA 
TCTGAACTCA CTAGAAAATA TAGCAACCAC CGAATTCAAA AATGCTATCA CCAATGCAAA 
AATTATTCCT TATGCAGGAT TAGTGTTAGT TATCACAGTT ACTGACAATA AAGGAGCATT 
CAAATATATC AAACCACAGA GTCAATTTAT AGTAGATCTT GGTGCCTACC TAGAAAAAGA 
GAGCATATAT TATGTGACTA CTAATTGGAA GCATACAGCT ACACGTTTTT CAATCAAACC 
ACTAGAGGAT TAAACTTAAT TATCAACACT GAATGACAGG TCCACATATA TCCTCAAACT 
ACACACTATA TCCAAACATC ATAAACATCT ACACTACACA CTTCATCACA CAAACCAATC 
CCACTCAAAA TCCAAAATCA CTACCAGCCA CTATCCGCTA GACCTAGAGT GCGAATAGGC 
AAATAAAACC AAAATATGGG GTAAATAGAC ATTAGTTAGA GTTCAATCAA TCTTAACAAC 
CATTTATACC GCCAATTCAA CACATATACT ATAAATCTTA AAATGGGAAA TACATCCATC 
ACAATAGAAC TCACAAGCAA ATTTTGGCCC TATTTTACAC TAATACATAT GATCTTAACT 
CTAATCTTTT TACTAATTAT AATCACTATC ATGATTGCAA CACTAAATAA GCTAAGTGAA 
CACAAAGCAT TCTGCAACAA AACTCTTGAA CTAGGACAGA TGTACCAAAT CAACACACAG 
AGTTCCACCA TTATGCTGTG TCAAACCATA ATCCTGTATA TACAAACAAA CAAATCCAAT 
CCTCTCACAG AGTCACGGTG TCGCAAAACC ACGCTAACCA TCATGGTAGC ATAGAGTAGT 
TATTTAAAAA TTAACATAAT GATGAATTGT TAGTATGAGA TCAAAAACAA CATTGGGGCA 
AATGCAACCA TGTCCAAACA CAAGAATCAA CGCACTGCCA GGACTCTAGA AAAGACCTGG 
GATACTCTTA ATCATCTAAT TGTAATATCC TCTTGTTTAT ACAGATTAAA TTTAAAATCT 
ATAGCACAAA TAGCACTATC AGTTTTGGCA ATGATAATCT CAACCTCTCT CATAATTGCA 
GCCATAATAT TCATCATCTC TGCCAATCAC AAAGTTACAC TAACAACGGT CACAGTTCAA 
ACAATAAAAA ACCACACTGA AAAAAACATC ACCACCTACC CTACTCAAGT CTCACCAGAA 
AGGGTTAGTT CATCCAAGCA ACCCACAACC ACATCACCAA TCCACACAAG TTCAGCTACA 
ACATCACCCA ATACAAAATC AGAAACACAC CATACAACAG CACAAACCAA AGGCAGAACC 
ACCACTTCAA CACAGACCAA CAAGCCAAGC ACAAAACCAC GTCCAAAAAA TCCACCAAAA 
AAAGATGATT ACCATTTTGA AGTGTTCAAC TTCGTTCCCT GCAGTATATG TGGCAACAAT 
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CAACTTTGCA AATCCATCTG CAAAACAATA CCAAGCAACA AACCAAAGAA GAAACCAACC 
ATCAAACCCA CAAACAAACC AACCACCAAA ACCACAAACA AAAGAGACCC AAAAACACCA 
GCCAAAACGA CGAAAAAAGA AACTACCACC AACCCAACAA AAAAACTAAC CCTCAAGACC 
ACAGAAAGAG ACACCAGCAC CTCACAATCC ACTGCACTCG ACACAACCAC ATTAAAACAC 
ACAGTCCAAC AGCAATCCCT CCTCTCAACC ACCCCCGAAA ACACACCCAA CTCCACACAA 
ACACCCACAG CATCCGAGCC CTCCACACCA AACTCCACCC AAAAAACCCA GCCACATGCT 
TAGTTATTCA AAAACTACAT CTTAGCAGAG AACCGTGATC TATCAAGCAA GAACGAAATT 
AAACCTGGGG CAAATAACCA TGGAGTTGAT GATCCACAAG TCAAGTGCAA TCTTCCTAAC 
TCTTGCTATT AATGCATTGT ACCTCACCTC AAGTCAGAAC ATAACTGAGG AGTTTTACCA 
ATCGACATGT AGTGCAGTTA GCAGAGGTTA TTTTAGTGCT TTAAGAACAG GTTGGTATAC 
TAGTGTCATA ACAATAGAAT TAAGTAATAT AAAAGAAACC AAATGCAATG GAACTGACAC 
TAAAGTAAAA CTTATGAAAC AAGAATTAGA TAAGTATAAG AATGCAGTAA CAGAATTACA 
GCTACTTATG CAAAACACAC CAGCTGTCAA CAACCGGGCC AGAAGAGAAG CACCACAGTA 
TATGAACTAC ACAATCAATA CCACTAAAAA CCTAAATGTA TCAATAAGCA AGAAGAGGAA 
ACGAAGATTT CTAGGCTTCT TGTTAGGTGT GGGATCTGCA ATAGCAAGTG GTATAGCTGT 
ATCAAAAGTT CTACACCTTG AAGGAGAAGT GAACAAGATC AAAAATGCTT TGTTGTCTAC 
AAACAAAGCT GTAGTCAGTT TATCAAATGG GGTCAGTGTT TTAACCAGCA AAGTGTTAGA 
TCTCAAGAAT TACATAAATA ACCAATTATT ACCCATAGTA AATCAACAGA GCTGTCGCAT 
CTCCAACATT GAAACAGTTA TAGAATTCCA GCAGAAGAAC AGCAGATTGT TGGAAATCAC 
CAGAGAATTT AGTGTCAATG CAGGTGTAAC AACACCTTTA AGCACTTACA TGTTGACAAA 
CAGTGAGTTA CTATCATTAA TCAATGATAT GCCTATAACA AATGATCAGA AAAAATTAAT 
GTCAAGCAAT GTTCAGATAG TAAGGCAACA AAGTTATTCC ATCATGTCTA TAATAAAGGA 
AGAAGTCCTT GCATATGTTG TACAGCTGCC TATCTATGGT GTAATAGATA CACCTTGCTG 
GAAATTGCAC ACATCGCCTC TATGCACTAC CAACATCAAA GAAGGATCAA ATATTTGTTT 
AACAAGGACT GATAGAGGAT GGTATTGTGA TAATGCAGGA TCAGTATCCT TCTTTCCACA 
GGCTGACACT TGTAAAGTAC AGTCCAATCG AGTATTTTGT GACACTATGA ACAGTTTGAC 
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ATTACCAAGT GAAGTCAGCC TTTGTAACAC TGACAT ATT C AATTCCAAGT ATGACTGCAA 
AATTATGACA TCAAAAACAG ACATAAGCAG CTCAGTAATT ACTTCTCTTG GAGCTATAGT 
GTCATGCTAT GGTAAAACTA AATGCACTGC ATCCAACAAA AATCGTGGGA TTATAAAGAC 
ATTTTCTAAT GGTTGTGACT ATGTGTCAAA CAAAGGAGTA GATACTGTGT CAGTGGGCAA 
CACTTTATAC TATGTAAACA AGCTGGAAGG CAAGAACCTT TATGTAAAAG GGGAACCTAT 
AATAAATTAC TATGACCCTC TAGTGTTTCC TTCTGATGAG TTTGATGCAT CAATATCTCA 
AGTCAATGAA AAAATCAATC AAAGTTTAGC TTTTATTCGT AGATCTGATG AATTACTACA 
TAATGTAAAT ACTGGCAAAT CTACTACAAA TATTATGATA ACTACAATTA TTATAGTAAT 
CATTGTAGTA TTGTTATCAT TAATAGCTAT TGGTTTACTG TTGTATTGTA AAGCCAAAAA 
CACACCAGTT ACACTAAGCA AAGACCAACT AAGTGGAATC AATAATATTG CATTCAGCAA 
ATAGACAAAA AACCACCTGA TCATGTTTCA ACAACAATCT GCTGACCACC AATCCCAAAT 
CAACTTACAA CAAATATTTC AACATCACAG TACAGGCTGA ATCATTTCCT CACATCATGC 
TACCCACATA ACTAAGCTAG ATCCTTAACT TATAGTTACA TAAAAACCTC AAGTATCACA 
ATCAACCACT AAATCAACAC ATCATTCACA AAATTAACAG CTGGGGCAAA TATGTCGCGA 
AGAAATCCTT GTAAATTTGA GATTAGAGGT CATTGCTTGA ATGGTAGAAG ATGTCACTAC 
AGTCATAATT ACTTTGAATG GCCTCCTCAT GCATTACTAG TGAGGCAAAA CTTCATGTTA 
AACAAGATAC TCAAGTCAAT GGACAAAAGC ATAGACACTT TGTCTGAAAT AAGTGGAGCT 
GCTGAACTGG ATAGAACAGA AGAATATGCT CTTGGTATAG TTGGAGTGCT AGAGAGTTAC 
ATAGGATCTA TAAACAACAT AACAAAACAA TCAGCATGTG TTGCTATGAG TAAACTTCTT 
ATTGAGATCA ATAGTGATGA CATTAAAAAG CTTAGAGATA ATGAAGAACC CAATTCACCT 
AAGATAAGAG TGTACAATAC TGTTATATCA TACATTGAGA GCAATAGAAA AAACAACAAG 
CAAACCATCC ATCTGCTCAA GAGACTACCA GCAGACGTGC TGAAGAAGAC AATAAAGAAC 
ACATTAGATA TCCACAAAAG CATAACCATA AGCAATCCAA AAGAGTCAAC TGTGAATGAT 
CAAAATGACC AAACCAAAAA TAATGATATT ACCGGATAAA TATCCTTGTA GTATATCATC 
CATATTGATC TCAAGTGAAA GCATGGTTGC TACATTCAAT CATAAAAACA TATTACAATT 
TAACCATAAC TATTTGGATA ACCACCAGCG TTTATTAAAT CATATATTTG ATGAAATTCA 
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TTGGACACCT AAAAACTTAT TAGATGCCAC TCAACAATTT CTCCAACATC TTAACATCCC 
TGAAGATATA TATACAGTAT ATATATTAGT GTCATAATGC TTGACCATAA CGACTCTATG 
TCATCCAACC ATAAAACTAT TTTGATAAGG TTATGGGACA AAATGGATCC CATTATTAAT 
GGAAACTCTG CTAATGTGTA TCTAACTGAT AGTTATTTAA AAGGTGTTAT CTCTTTTTCA 
GAGTGTAATG CTTTAGGGAG TTATCTTTTT AACGGCCCTT ATCTTAAAAA TGATTACACC 
AACTTAATTA GTAGACAAAG CCCACTACTA GAGCATATGA ATCTTAAAAA ACTAACTATA 
ACACAGTCAT TAATATCTAG ATATCATAAA GGTGAACTGA AATTAGAAGA ACCAACTTAT 
TTCCAGTCAT TACTTATGAC ATATAAAAGT ATGTCCTCGT CTGAACAAAT TGCTACAACT 
AACTTACTTA AAAAAATAAT ACGAAGAGCC ATAGAAATAA GTGATGTAAA GGTGTACGCC 
ATCTTGAATA AACTAGGATT AAAGGAAAAG GACAGAGTTA AGCCCAACAA TAATTCAGGT 
GATGAAAACT CAGTACTTAC AACCATAATT AAAGATGATA TACTTTCGGC TGTGGAAAAC 
AATCAATCAT ATACAAATTC AGACAAAAGT CACTCAGTAA ATCAAAATAT CACTATCAAA 
ACAACACTCT TGAAAAAATT GATGTGTTCA ATGCAACATC CTCCATCATG GTTAATACAC 
TGGTTCAATT TATATACAAA ATTAAATAAC ATATTAACAC AATATCGATC AAATGAGGTA 
AAAAGTCATG GGTTTATATT AATAGATAAT CAAACTTTAA GTGGTTTTCA GTTTATTTTA 
AATCAATATG GTTGTAT CGT TTATCATAAA GGACTCAAAA AAATCACAAC TACTACTTAC 
AATCAATTTT TGACATGGAA AGACATCAGC CTTAGCAGAT TAAATGTTTG CTTAATTACT 
TGGATAAGTA ATTGTTTAAA TACATTAAAC AAAAGCTTAG GGCTGAGATG TGGATTCAAT 
AATGTTGTGT TATCACAATT ATTTCTTTAT GGAGATTGTA TACTGAAATT ATTTCATAAT 
GAAGGCTTCT ACATAATAAA AGAAGTAGAG GGATTTATTA TGTCTTTAAT TCTAAACATA 
ACAGAAGAAG ATCAATTTAA GAAACGATTT TATAATAGCA TGCTAAATAA CATCACAGAT 
GCAGCTATTA AGGCTCAAAA GGACCTACTA TCAAGAGTAT GTCACACTTT ATTAGACAAG 
ACAGTGTCTG ATAATATCAT AAATGGTAAA TGGATAATCC TATTAAGTAA ATTTCTTAAA 
TTGATTAAGC TTGCAGGTGA TAATAATCTC AATAACTTGA GTGAGCTATA TTTTCTCTTC 
AGAATCTTTG GACATCCAAT GGTCGATGAA AGACAAGCAA TGGATTCTGT AAGAATTAAC 
TGTAATGAAA CTAGGTTCTA CTTATTAAGT AGTCTAAGTA CATTAAGAGG TGCTTTCATT 
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TATAGAATCA TAAAAGGGTT TGTAAATACC TACAACAGAT GGCCCACCTT AAGGAATGCT 
ATTGTCCTAC CTCTAAGATG GTTAAACTAC TATAAACTTA ATACTTATCC ATCTCTACTT 
GAAATCACAG AAAATGATTT GATTATTTTA TCAGGATTGC GGTTCTATCG TGAGTTTCAT 
CTGCCTAAAA AAGTGGATCT TGAAATGATA ATAAATGACA AAGCCATTTC ACCTCCAAAA 
GATCTAATAT GGACTAGTTT TCCTAGAAAT TACATGCCAT CACATATACA AAATTATATA 
GAACATGAAA AGTTGAAGTT CTCTGAAAGC GACAGATCGA GAAGAGTACT AGAGTATTAC 
TTGAGAGATA ATAAATTCAA TGAATGCGAT CTATACAATT GTGTAGTCAA TCAAAGCTAT 
CTCAACAACT CTAATCACGT GGTATCACTA ACTGGTAAAG AAAGAGAGCT CAGTGTAGGT 
AGAATGTTTG CTATGCAACC AGGTATGTTT AGGCAAATCC AAATCTTAGC AGAGAAAATG 
ATAGCTGAAA ATATTTTACA ATTCTTCCCT GAGAGTTTGA CAAGATATGG TGATCTAGAG 
CTTCAAAAGA TATTAGAATT AAAAGCAGGA ATAAGCAACA AGTCAAATCG TTATAATGAT 
AACTACAACA ATTATATCAG TAAATGTTCT ATCATTACAG ATCTTAGCAA ATTCAATCAG 
GCATTTAGAT ATGAAACATC ATGTATCTGC AGTGATGTAT TAGATGAACT GCATGGAGTA 
CAATCTCTGT TCTCTTGGTT GCATTTAACA ATACCTCTTG TCACAATAAT ATGTACATAT 
AGACATGCAC CTCCTTTCAT AAAGGATCAT GTTGTTAATC TTAATGAGGT TGATGAACAA 
AGTGGATTAT ACAGATATCA TATGGGTGGT ATTGAGGGCT GGTGTCAAAA ACTGTGGACC 
ATTGAAGCTA TATCATTATT AGATCTAATA TCTCTCAAAG GGAAATTCTC TATCACAGCT 
CTGATAAATG GTGATAATCA GTCAATTGAT ATAAGCAAAC CAGTTAGACT TATAGAGGGT 
CAGACCCATG CACAAGCAGA TTATTTGTTA GCATTAAATA GCCTTAAATT GTTATATAAA 
GAGTATGCAG GTATAGGCCA TAAGCTTAAG GGAACAGAGA CCTATATATC CCGAGATATG 
CAGTTCATGA GCAAAACAAT CCAGCACAAT GGAGTGTACT ATCCAGCCAG TATCAAAAAA 
GTCCTGAGAG TAGGTCCATG GATAAACACG ATACTTGATG ATTTTAAAGT TAGTTTAGAA 
TCTATAGGCA GCTTAACACA GGAGTTAGAA TACAGAGGAG AAAGCTTATT ATGCAGTTTA 
ATATTTAGGA ACATTTGGTT ATACAATCAA ATTGCTTTGC AACTCCGAAA TCATGCATTA 
TGTAACAATA AGCTATATTT AGATATATTG AAAGTATTAA AACACTTAAA AACTTTTTTT 
AATCTTGATA GCATTGATAT GGCTTTATCA T TGTATATG A ATT TGCCTAT GCTGTTTGGT 
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GGTGGTGATC CTAATTTGTT ATATCGAAGC TTTTATAGGA GAACTCCAGA CTTCCTTACA 
GAAGCTATAG TACATTCAGT GTTTGTGTTG AGCTATTATA CTGGTCACGA TTTACAAGAT 
AAGCTCCAGG ATCTTCCAGA TGATAGACTG AACAAATTCT TGACATGTGT CATCACATTT 
GATAAAAATC CCAATGCCGA GTTTGTAACA TTGATGAGGG ATCCACAGGC TTTAGGGTCT 
GAAAGGCAAG CTAAAATTAC TAGTGAGATT AATAGATTAG CAGTAACAGA AGTCTTAAGT 
ATAGCCCCAA ACAAAATATT TTCTAAAAGT GCACAACATT ATACTACCAC TGAGATTGAT 
CTAAATGACA TTATGCAAAA TATAGAACCA ACTTACCCTC ATGGATTAAG AGTTGTTTAT 
GAAAGTTTAC CTTTTTATAA AGCAGAAAAA ATAGTTAATC TTATATCAGG AACAAAATCC 
ATAACTAATA TACTTGAAAA AACATCAGCA ATAGATACAA CTGATATTAA TAGGGCTACT 
GATATGATGA GGAAAAATAT AACTTTACTT ATAAGGATAC TTCCACTAGA TTGTAACAAA 
GACAAAAGAG AGTTATTAAG TTTAGAAAAT CTTAGTATAA CTGAATTAAG CAAGTATGTA 
AGAGAAAGAT CTTGGTCATT ATCCAATATA GTAGGAGTAA CATCGCCAAG TATTATGTTC 
ACAATGAACA TTAAATATAC AACTAGCACT ATAGCCAGTG GTATAATAAT AGAAAAATAT 
AATGTTAATA GTTTAACTCG TGGTGAAAGA GGACCCACCA AGCCATGGGT AGGCTCATCC 
ACGCAGGAGA AAAAAACAAT GCCAGTGTAC AACAGACAAG TTTTAACCAA AAAGCAAAGA 
GACCAAATAG ATTTATTAGC AAAATTAGAC TGGGTATATG CATCCATAGA CAACAAAGAT 
GAATTCATGG AAGAACTGAG TACTGGAACA CTTGGACTGT CATATGAAAA AGCCAAAAAG 
TTGTTTCCAC AATATCTAAG TGTCAATTAT TTACACCGTT TAACAGTCAG TAGTAGACCA 
TGTGAATTCC CTGCATCAAT ACCAGCTTAT AGAACAACAA ATTATCATTT TGATACTAGT 
CCTATCAATC ATGTATTAAC AGAAAAGTAT GGAGATGAAG ATATCGACAT TGTGTTTCAA 
AATTGCATAA GTTTTGGTCT TAGCCTGATG TCGGTTGTGG AACAATTCAC AAACATATGT 
CCTAATAGAA TTATTCTCAT ACCGAAGCTG AATGAGATAC ATTTGATGAA ACCTCCTATA 
TTTACAGGAG ATGTTGATAT CATCAAGTTG AAGCAAGTGA TACAAAAGCA GCACATGTTC 
CTACCAGATA AAATAAGTTT AACCCAATAT GTAGAATTAT TCTTAAGTAA CAAAGCACTT 
AAATCTGGAT CTCACATCAA CTCTAATTTA ATATTAGTAC ATAAAATGTC TGATTATTTT 
CATAATGCTT ATATTT TAAG TACTAATTTA GCTGGACATT GGATT CTGAT TATTCAACTT 
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ATGAAAGATT CAAAAGGTAT TTTTGAAAAA GATTGGGGAG AGGGGTACAT AACTGATCAT 
ATGTTCATTA ATTTGAATGT TTTCTTTAAT GCTTATAAGA CTTATTTGCT ATGTTTTCAT 
AAAGGTTATG GTAAAGCAAA ATTAGAATGT GATATGAACA CTTCAGATCT TCTTTGTGTT 
TTGGAGTTAA TAGACAGTAG CTACTGGAAA TCTATGTCTA AAGTTTTCCT AGAACAAAAA 
GTCATAAAAT ACATAGTCAA TCAAGACACA AGTTTGCGTA GAATAAAAGG CTGTCACAGT 
TTTAAGTTGT GGTTTTTAAA ACGCCTTAAT AATGCTAAAT TTACCGTATG CCCTTGGGTT 
GTTAACATAG ATTATCACCC AACACACATG AAAGCTATAT TATCTTACAT AGATTTAGTT 
AGAATGGGGT TAATAAATGT AGATAAATTA ACCATTAAAA ATAAAAACAA ATTCAATGAT 
GAATTTTACA CATCAAATCT CTTTTACATT AGTTATAACT TTTCAGACAA CACTCATTTG 
CTAACAAAAC AAATAAGAAT TGCTAATTCA GAATTAGAAG ATAATTATAA CAAACTATAT 
CACCCAACCC CAGAAACTTT AGAAAATATG TCATTAATTC CTGTTAAAAG TAATAATAGT 
AACAAACCTA AATTTTGTAT AAGTGGAAAT ACCGAATCTA TGATGATGTC AACATTCTCT 
AGTAAAATGC ATATTAAATC TTCCACTGTT ACCACAAGAT TCAATTATAG CAAACAAGAC 
TTGTACAATT TATTTCCAAT TGTTGTGATA GACAAGATTA TAGATCATTC AGGTAATACA 
GCAAAATCTA ACCAACTTTA CACCACCACT TCACATCAGA CAT CTTT AGT AAGGAATAGT 
GCATCACTTT ATTGCATGCT TCCTTGGCAT CATGTCAATA GATTTAACTT TGTATTTAGT 
TCCACAGGAT GCAAGATCAG TATAGAGTAT ATTTTAAAAG ATCTTAAGAT TAAGGACCCC 
AGTTGTATAG CATTCATAGG TGAAGGAGCT GGTAACTTAT TATTACGTAC GGTAGTAGAA 
CTTCATCCAG ACATAAGATA CATTTACAGA AGTTTAAAAG ATTGCAATGA TCATAGTTTA 
CCTATTGAAT TTCTAAGGTT ATACAACGGG CAT AT AAA CA TAGATTATGG TGAGAATTTA 
ACCATTCCTG CTACAGATGC AACTAATAAC ATTCATTGGT CTTATTTACA TATAAAATTT 
GCAGAACCTA TTAGCATCTT TGTCTGCGAT GCTGAATTAC CTGTTACAGC CAATTGGAGT 
AAAATTATAA TTGAATGGAG TAAGCATGTA AGAAAGTGCA AGTACTGTTC TTCTGTAAAT 
AGATGCATTT TAATTGCAAA ATATCATGCT CAAGATGACA TTGATTTCAA ATTAGATAAC 
ATTACTATAT TAAAAACTTA CGTGTGCCTA GGTAGCAAGT TAAAAGGATC TGAAGTTTAC 
TTAATCCTTA CAATAGGCCC TGCAAATATA CTTCCTGTTT TTGATGTTGT ACAAAATGCT 
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AAATTGATAC TTTCAAGAAC TAAAAATTTC ATTATGCCTA AAAAAACTGA CAAGGAATCT 14640 

ATCGATGCAA ATATTAAAAG CTTAATACCT TTCCTTTGTT ACCCTATAAC AAAAAAAGGA 14700 

ATTAAGACTT CATTGTCAAA ATTGAAGAGT GTAGTTAATG GAGATATATT ATCATATTCT 14760 

ATAGCTGGAC GTAATGAAGT ATTCAGCAAC AAGCTTATAA ACCACAAGCA TATGAATATC 14820 

CTAAAATGGC TAGATCATGT TTTAAATTTT AGATCAGCTG AACTTAATTA CAATCATTTA 14880 

TACATGATAG AGTCCACATA TCCTTACTTA AGTGAATTGT TAAATAGTTT AACAACCAAT 14940 

GAGCTCAAGA AGCTGATTAA AATAACAGGT AGTGTGCTAT ACAACCTTCC CAACGAACAG 15000 

TAGTTTAAAA TATCATTAAC AAGTTTGGTC AAATTTAGAT GCTAACACAT CATTATATTA 15060 

TAGTTATTAA AGAATATACA AACTTTTCAA TAATTTAGCA TATTGATTCC AAAATTATCA 15120 

TTTTAGTCTT AAGGGGTTAA ATAAAAGTCT AAAACTAACA ATTATACATG TGCATTCACA 15180 

ACACAACGAG ACATTAGTTT TTGACACTTT TTTTCTCGT 15219 
(2) INFORMATION FOR SEQ ID NO: 28: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2166 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 28: 

Met Asp Pro He He Asn Gly Asn Ser Ala Asn Val Tyr Leu Thr Asp 
15 10 15 

Ser Tyr Leu Lys Gly Val He Ser Phe Ser Glu Cys Asn Ala Leu Gly 
20 25 30 

Ser Tyr Leu Phe Asn Gly Pro Tyr Leu Lys Asn Asp Tyr Thr Asn Leu 
35 40 " 45 

He Ser Arg Gin Ser Pro Leu Leu Glu His Met Asn Leu Lys Lys Leu 
50 55 60 

Thr He Thr Gin Ser Leu He Ser Arg Tyr His Lys Gly Glu Leu Lys 
65 70 75 80 
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Leu Glu Glu Pro Thr Tyr Phe Gin Ser Leu Leu Met Thr Tyr Lye Ser 
85 90 95 

Met Ser Ser Ser Glu Gin lie Ala Thr Thr Asn Leu Leu Lys Lys lie 
100 105 110 

He Arg Arg Ala He Glu He Ser Asp Val Lys Val Tyr Ala He Leu 
115 120 125 

Asn Lys Leu Gly Leu Lys Glu Lys Asp Arg Val Lys Pro Asn Asn Asn 
130 135 140 

Ser Gly Asp Glu Asn Ser Val Leu Thr Thr He lie Lys Asp Asp He 
145 150 155 160 

Leu Ser Ala Val Glu Asn Asn Gin Ser Tyr Thr Asn Ser Asp Lys Ser 
165 170 175 

His Ser Val Asn Gin Asn He Thr He Lys Thr Thr Leu Leu Lys Lys 
180 185 190 

Leu Met Cys Ser Met Gin His Pro Pro Ser Trp Leu He His Trp Phe 
195 200 205 

Asn Leu Tyr Thr Lys Leu Asn Asn lie Leu Thr Gin Tyr Arg Ser Asn 
210 215 220 

Glu Val Lys Ser His Gly Phe He Leu He Asp Asn Gin Thr Leu Ser 
225 230 235 240 

Gly Phe Gin Phe He Leu Asn Gin Tyr Gly Cys He Val Tyr His Lys 
245 250 255 

Gly Leu Lys Lys He Thr Thr Thr Thr Tyr Asn Gin Phe Leu Thr Trp 
260 265 270 

Lys Asp He Ser Leu Ser Arg Leu Asn Val Cys Leu He Thr Trp He 
275 280 285 

Ser Asn Cys Leu Asn Thr Leu Asn Lys Ser Leu Gly Leu Arg Cys Gly 
290 295 300 

Phe Asn Asn Val Val Leu Ser Gin Leu Phe Leu Tyr Gly Asp Cys He 
305 310 315 320 

Leu Lys Leu Phe His Asn Glu Gly Phe Tyr He He Lys Glu Val Glu 
325 330 335 

Gly Phe He Met Ser Leu lie Leu Asn He Thr Glu Glu Asp Gin Phe 
340 345 350 
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Lys Lys Arg Phe Tyr Asn Ser Met Leu Asn Asn He Thr Asp Ala Ala 
355 360 365 

He Lys Ala Gin Lys Asp Leu Leu Ser Arg Val Cys His Thr Leu Leu 
370 375 380 

Asp Lys Thr Val Ser Asp Asn He He Asn Gly Lys Trp He He Leu 
385 390 395 400 

Leu Ser Lys Phe Leu Lys Leu He Lys Leu Ala Gly Asp Asn Asn Leu 
405 410 415 

Asn Asn Leu Ser Glu Leu Tyr Phe Leu Phe Arg He Phe Gly His Pro 
420 425 430 

Met Val Asp Glu Arg Gin Ala Met Asp Ser Val Arg He Asn Cys Asn 
435 440 445 

Glu Thr Arg Phe Tyr Leu Leu Ser Ser Leu Ser Thr Leu Arg Gly Ala 
450 455 460 

Phe He Tyr Arg He He Lys Gly Phe Val Asn Thr Tyr Asn Arg Trp 
465 470 475 480 

Pro Thr Leu Arg Asn Ala He Val Leu Pro Leu Arg Trp Leu Asn Tyr 
485 490 495 

Tyr Lys Leu Asn Thr Tyr Pro Ser Leu Leu Glu He Thr Glu Asn Asp 
500 505 510 

Leu He He Leu Ser Gly Leu Arg Phe Tyr Arg Glu Phe His Leu Pro 
515 520 525 

Lys Lys Val Asp Leu Glu Met He He Asn Asp Lys Ala He Ser Pro 
530 535 540 

Pro Lys Asp Leu He Trp Thr Ser Phe Pro Arg Asn Tyr Met Pro Ser 
545 550 555 560 

His He Gin Asn Tyr He Glu His Glu Lys Leu Lys Phe Ser Glu Ser 
565 570 575 

Asp Arg Ser Arg Arg Val Leu Glu Tyr Tyr Leu Arg Asp Asn Lys Phe 
580 585 590 

Asn Glu Cys Asp Leu Tyr Asn Cys Val Val Asn Gin Ser Tyr Leu Asn 
595 600 605 

Asn Ser Asn His Val Val Ser Leu Thr Gly Lys Glu Arg Glu Leu Ser 
610 615 620 

Val Gly Arg Met Phe Ala Met Gin Pro Gly Met Phe Arg Gin He Gin 
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625 630 635 640 

lie Leu Ala Glu Lys Met lie Ala Glu Asn lie Leu Gin Phe Phe Pro 
645 650 655 

Glu Ser Leu Thr Arg Tyr Gly Asp Leu Glu Leu Gin Lys lie Leu Glu 
660 665 670 

Leu Lys Ala Gly lie Ser Asn Lys Ser Asn Arg Tyr Asn Asp Asn Tyr 
675 680 685 

Asn Asn Tyr lie Ser Lys Cys Ser lie He Thr Asp Leu Ser Lys Phe 
690 695 700 

Asn Gin Ala Phe Arg Tyr Glu Thr Ser Cys He Cys Ser Asp Val Leu 
705 710 715 720 

Asp Glu Leu His Gly Val Gin Ser Leu Phe Ser Trp Leu His Leu Thr 
725 730 735 

He Pro Leu Val Thr He He Cys Thr Tyr Arg His Ala Pro Pro Phe 
740 745 750 

He Lys Asp His Val Val Asn Leu Asn Glu Val Asp Glu Gin Ser Gly 
755 760 765 

Leu Tyr Arg Tyr His Met Gly Gly He Glu Gly Trp Cys Gin Lys Leu 
770 775 780 

Trp Thr He Glu Ala He Ser Leu Leu Asp Leu He Ser Leu Lys Gly 
785 790 795 800 

Lys Phe Ser He Thr Ala Leu He Asn Gly Asp Asn Gin Ser He Asp 
805 810 815 

He Ser Lys Pro Val Arg Leu He Glu Gly Gin Thr His Ala Gin Ala 
820 825 830 

Asp Tyr Leu Leu Ala Leu Asn Ser Leu Lys Leu Leu Tyr Lys Glu Tyr 
835 840 845 

Ala Gly He Gly His Lys Leu Lys Gly Thr Glu Thr Tyr He Ser Arg 
850 855 860 

Asp Met Gin Phe Met Ser Lys Thr He Gin His Asn Gly Val Tyr Tyr 
865 870 875 880 

Pro Ala Ser He Lys Lys Val Leu Arg Val Gly Pro Trp He Asn Thr 
885 890 895 

He Leu Asp Asp Phe Lys Val Ser Leu Glu Ser He Gly Ser Leu Thr 
900 905 910 
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Gin Glu Leu Glu Tyr Arg Gly Glu Ser Leu Leu Cys Ser Leu lie Phe 
915 920 925 

Arg Asn lie Trp Leu Tyr Asn Gin He Ala Leu Gin Leu Arg Asn His 
930 935 940 

Ala Leu Cys Asn Asn Lys Leu Tyr Leu Asp He Leu Lys Val Leu Lys 
945 950 955 960 

His Leu Lys Thr Phe Phe Asn Leu Asp Ser He Asp Met Ala Leu Ser 
965 970 975 

Leu Tyr Met Asn Leu Pro Met Leu Phe Gly Gly Gly Asp Pro Asn Leu 
980 985 990 

Leu Tyr Arg Ser Phe Tyr Arg Arg Thr Pro Asp Phe Leu Thr Glu Ala 
995 1000 1005 

He Val His Ser Val Phe Val Leu Ser Tyr Tyr Thr Gly His Asp Leu 
1010 1015 1020 

Gin Asp Lys Leu Gin Asp Leu Pro Asp Asp Arg Leu Asn Lys Phe Leu 
1025 1030 1035 1040 

Thr Cys Val He Thr Phe Asp Lys Asn Pro Asn Ala Glu Phe Val Thr 
1045 1050 1055 

Leu Met Arg Asp Pro Gin Ala Leu Gly Ser Glu Arg Gin Ala Lys He 
1060 1065 1070 

Thr Ser Glu He Asn Arg Leu Ala Val Thr Glu Val Leu Ser He Ala 
1075 1080 1085 

Pro Asn Lys He Phe Ser Lys Ser Ala Gin His Tyr Thr Thr Thr Glu 
1090 1095 1100 

He Asp Leu Asn Asp He Met Gin Asn He Glu Pro Thr Tyr Pro His 
1105 1110 1115 1120 

Gly Leu Arg Val Val Tyr Glu Ser Leu Pro Phe Tyr Lys Ala Glu Lys 
1125 1130 1135 

He Val Asn Leu He Ser Gly Thr Lys Ser He Thr Asn He Leu Glu 
1140 1145 1150 

Lys Thr Ser Ala He Asp Thr Thr Asp He Asn Arg Ala Thr Asp Met 
1155 1160 1165 

Met Arg Lys Asn He Thr Leu Leu He Arg He Leu Pro Leu Asp Cys 
1170 1175 1180 
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Asn Lys Asp Lys Arg Glu Leu Leu Ser Leu Glu Asn Leu Ser He Thr 
1185 1190 1195 1200 

Glu Leu Ser Lys Tyr Val Arg Glu Arg Ser Trp Ser Leu Ser Asn He 
1205 1210 1215 

Val Gly Val Thr Ser Pro Ser He Met Phe Thr Met Asn He Lys Tyr 
1220 1225 1230 

Thr Thr Ser Thr He Ala Ser Gly He He He Glu Lys Tyr Asn Val 
1235 1240 1245 

Asn Ser Leu Thr Arg Gly Glu Arg Gly Pro Thr Lys Pro Trp Val Gly 
1250 1255 1260 

Ser Ser Thr Gin Glu Lys Lys Thr Met Pro Val Tyr Asn Arg Gin Val 
1265 1270 1275 1280 

Leu Thr Lys Lys Gin Arg Asp Gin He Asp Leu Leu Ala Lys Leu Asp 
1285 1290 1295 

Trp Val Tyr Ala Ser He Asp Asn Lys Asp Glu Phe Met Glu Glu Leu 
1300 1305 1310 

Ser Thr Gly Thr Leu Gly Leu Ser Tyr Glu Lys Ala Lys Lys Leu Phe 
1315 1320 1325 

Pro Gin Tyr Leu Ser Val Asn Tyr Leu His Arg Leu Thr Val Ser Ser 
1330 1335 1340 

Arg Pro Cys Glu Phe Pro Ala Ser He Pro Ala Tyr Arg Thr Thr Asn 
1345 1350 1355 1360 

Tyr His Phe Asp Thr Ser Pro He Asn His Val Leu Thr Glu Lys Tyr 
1365 1370 1375 

Gly Asp Glu Asp He Asp He Val Phe Gin Asn Cys He Ser Phe Gly 
1380 1385 1390 

Leu Ser Leu Met Ser Val Val Glu Gin Phe Thr Asn He Cys Pro Asn 
1395 1400 1405 

Arg lie He Leu He Pro Lys Leu Asn Glu He His Leu Met Lys Pro 
1410 1415 1420 

Pro He Phe Thr Gly Asp Val Asp He He Lys Leu Lys Gin Val He 
1425 1430 1435 1440 

Gin Lys Gin His Met Phe Leu Pro Asp Lys He Ser Leu Thr Gin Tyr 
1445 1450 1455 

Val Glu Leu Phe Leu Ser Asn Lys Ala Leu Lys Ser Gly Ser His He 
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1460 1465 1470 

Asn Ser Asn Leu lie Leu Val His Lys Met Ser Asp Tyr Phe His Asn 
1475 1480 1485 

Ala Tyr He Leu Ser Thr Asn Leu Ala Gly His Trp He Leu He He 
1490 1495 1500 

Gin Leu Met Lys Asp Ser Lys Gly He Phe Glu Lys Asp Trp Gly Glu 
1505 1510 1515 1520 

Gly Tyr He Thr Asp His Met Phe He Asn Leu Asn Val Phe Phe Asn 
1525 1530 1535 

Ala Tyr Lys Thr Tyr Leu Leu Cys Phe His Lys Gly Tyr Gly Lys Ala 
1540 1545 1550 

Lys Leu Glu Cys Asp Met Asn Thr Ser Asp Leu Leu Cys Val Leu Glu 
1555 1560 1565 

Leu He Asp Ser Ser Tyr Trp Lys Ser Met Ser Lys Val Phe Leu Glu 
1570 1575 1580 

Gin Lys Val He Lys Tyr He Val Asn Gin Asp Thr Ser Leu Arg Arg 
1585 1590 1595 1600 

He Lys Gly Cys His Ser Phe Lys Leu Trp Phe Leu Lys Arg Leu Asn 
1605 1610 1615 

Asn Ala Lys Phe Thr Val Cys Pro Trp Val Val Asn He Asp Tyr His 
1620 1625 1630 

Pro Thr His Met Lys Ala He Leu Ser Tyr He Asp Leu Val Arg Met 
1635 1640 1645 

Gly Leu He Asn Val Asp Lys Leu Thr He Lys Asn Lys Asn Lys Phe 
1650 1655 1660 

Asn Asp Glu Phe Tyr Thr Ser Asn Leu Phe Tyr He Ser Tyr Asn Phe 
1665 1670 1675 1680 

Ser Asp Asn Thr His Leu Leu Thr Lys Gin He Arg He Ala Asn Ser 
1685 1690 1695 

Glu Leu Glu Asp Asn Tyr Asn Lys Leu Tyr His Pro Thr Pro Glu Thr 
1700 1705 1710 

Leu Glu Asn Met Ser Leu He Pro Val Lys Ser Asn Asn Ser Asn Lys 
1715 1720 1725 

Pro Lys Phe Cys He Ser Gly Asn Thr Glu Ser Met Met Met Ser Thr 
1730 1735 1740 
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Phe Ser Ser Lys Met His lie Lys Ser Ser Thr Val Thr Thr Arg Phe 
1745 1750 1755 1760 

Asn Tyr Ser Lys Gin Asp Leu Tyr Asn Leu Phe Pro He Val Val He 
1765 1770 1775 

Asp Lys He He Asp His Ser Gly Asn Thr Ala Lys Ser Asn Gin Leu 
1780 1785 1790 

Tyr Thr Thr Thr Ser His Gin Thr Ser Leu Val Arg Asn Ser Ala Ser 
1795 1800 1805 

Leu Tyr Cys Met Leu Pro Trp His His Val Asn Arg Phe Asn Phe Val 
1810 1815 1820 

Phe Ser Ser Thr Gly Cys Lys He Ser He Glu Tyr He Leu Lys Asp 
1825 1830 1835 1840 

Leu Lys He Lys Asp Pro Ser Cys He Ala Phe He Gly Glu Gly Ala 
1845 1850 1855 

Gly Asn Leu Leu Leu Arg Thr Val Val Glu Leu His Pro Asp He Arg 
1860 1865 1870 

Tyr He Tyr Arg Ser Leu Lys Asp Cys Asn Asp His Ser Leu Pro He 
1875 1880 1885 

Glu Phe Leu Arg Leu Tyr Asn Gly His He Asn He Asp Tyr Gly Glu 
1890 1895 1900 

Asn Leu Thr He Pro Ala Thr Asp Ala Thr Asn Asn He His Trp Ser 
1905 1910 1915 1920 

Tyr Leu His He Lys Phe Ala Glu Pro He Ser He Phe Val Cys Asp 
1925 1930 1935 

Ala Glu Leu Pro Val Thr Ala Asn Trp Ser Lys He He He Glu Trp 
1940 1945 1950 

Ser Lys His Val Arg Lys Cys Lys Tyr Cys Ser Ser Val Asn Arg Cys 
1955 1960 1965 

He Leu He Ala Lys Tyr His Ala Gin Asp Asp He Asp Phe Lys Leu 
1970 1975 1980 

Asp Asn He Thr He Leu Lys Thr Tyr Val Cys Leu Gly Ser Lys Leu 
1985 1990 1995 2000 

Lys Gly Ser Glu Val Tyr Leu He Leu Thr He Gly Pro Ala Asn He 
2005 2010 2015 
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Leu Pro Val Phe Asp Val Val Gin Asn Ala Lys Leu He Leu Ser Arg 
2020 2025 2030 

Thr Lys Asn Phe He Met Pro Lys Lys Thr Asp Lys Glu Ser He Asp 
2035 2040 2045 

Ala Asn He Lys Ser Leu He Pro Phe Leu Cys Tyr Pro He Thr Lys 
2050 2055 2060 

Lys Gly He Lys Thr Ser Leu Ser Lys Leu Lys Ser Val Val Asn Gly 
2065 2070 2075 2080 

Asp He Leu Ser Tyr Ser He Ala Gly Arg Asn Glu Val Phe Ser Asn 
2085 2090 2095 

Lys Leu He Asn His Lys His Met Asn He Leu Lys Trp Leu Asp His 
2100 2105 2110 

Val Leu Asn Phe Arg Ser Ala Glu Leu Asn Tyr Asn His Leu Tyr Met 
2115 2120 2125 

He Glu Ser Thr Tyr Pro Tyr Leu Ser Glu Leu Leu Asn Ser Leu Thr 
2130 2135 2140 

Thr Asn Glu Leu Lys Lys Leu He Lys He Thr Gly Ser Val Leu Tyr 
2145 2150 2155 2160 

Asn Leu Pro Asn Glu Gin 
2165 

(2) INFORMATION FOR SEQ ID NO: 29: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15219 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNES S : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi> SEQUENCE DESCRIPTION: SEQ ID NO: 29: 

ACGGGAAAAA AATGCGTACT ACAAACTTGC ACATTCGAAA AAAATGGGGC AAATAAGAAC 60 

TTGATAAGTG CTATTTAAGT CTAACCTTTT CAATCAGAAA TGGGGTGCAA TTCACTGAGC 120 

ATGATAAAGG TTAGATTACA AAATTTATTT GACAATGACG AAGTAGCATT GTTAAAAATA 180 

ACATGTTATA CTGATAAATT AATTCTTCTG ACCAATGCAT TAGCCAAAGC AGCAATACAT 240 
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ACAATTAAAT TAAACGGCAT AGTTTTTATA CATGTTATAA CAAGCAGTGA AGTGTGCCCT 
GATAACAATA TTGTAGTGAA ATCTAACTTT ACAACAATGC CAATACTACA AAATGGAGGA 
TACATATGGG AATTGATTGA GTTGACACAC TGCTCTCAAT TAAACGGTTT AATGGATGAT 
AATTGTGAAA TCAAATTTTC TAAAAGACTA AGTGACTCAG TAATGACTAA TTATATGAAT 
CAAATATCTG ACTTACTTGG GCTTGATCTC AATTCATGAA TTATGTTTAG TCTAATTCAA 
TAGACATGTG TTTATTACCA TTTTAGTTAA TATAAAAACT CATCAAAGGG AAATGGGGCA 
AATAAACTCA CCTAATCAAT CAAACCATGA GCACTACAAA TGACAACACT ACTATGCAAA 
GATTGATGAT CACAGACATG AGACCCCTGT CAATGGATTC AATAATAACA TCTCTTACCA 
AAGAAATCAT CACACACAAA TTCATATACT TGATAAACAA TGAATGTATT GTAAGAAAAC 
TTGATGAAAG ACAAGCTACA TTTACATTCT TAGTCAATTA TGAGATGAAG CTACTGCACA 
AAGTAGGGAG TACCAAATAC AAAAAATACA CTGAATATAA TACAAAATAT GGCACTTTCC 
CCATGCCTAT ATTTATCAAT CACGGCGGGT TTCTAGAATG TATTGGCATT AAGCCTACAA 
AACACACTCC TATAATATAC AAATATGACC TCAACCCGTG AATTCCAACA AAAAAACCAA 
CCCAACCAAA CCAAACTATT CCTCAAACAA CAGTGCTCAA TAGTTAAGAA GGAGCTAATC 
CATTTTAGTA ATTAAAAATA AAAGTAAAGC CAATAACATA AATTGGGGCA AATACAAAGA 
TGGCTCTTAG CAAAGTCAAG TTGAATGATA CATTAAATAA GGATCAGCTG CTGTCATCCA 
GCAAATACAC TATTCAACGT AGTACAGGAG ATAATATTGA CACTCCCAAT TATGATGTGC 
AAAAACACCT AAACAAACTA TGTGGTATGC TATTAATCAC TGAAGATGCA AATCATAAAT 
TCACAGGATT AATAGGTATG TTATATGCTA TGTCCAGGTT AGGAAGGGAA GACACTATAA 
AGATACTTAA AGATGCTGGA TATCATGTTA AAGCTAATGG AGTAGATATA ACAACATATC 
GTCAAGATAT AAATGGAAAG GAAATGAAAT TCGAAGTATT AACATTATCA AGCTTGACAT 
CAGAAATACA AGTCAATATT GAGATAGAAT CTAGAAAGTC CTACAAAAAA ATGCTAAAAG 
AGATGGGAGA AGTGGCTCCA GAATATAGGC ATGATTCTCC AGACTGTGGG ATGATAATAC 
TGTGTATAGC TGCACTTGTG ATAACCAAAT TAGCAGCAGG AGACAGATCA GGTCTTACAG 
CAGTAATTAG GAGGGCAAAC AATGTCTTAA AAAACGAAAT AAAACGATAC AAGGGCCTCA 
TACCAAAGGA TATAGCTAAC AGTTTTTATG AAGTGTTTGA AAAACACCCT CATCTTATAG 
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ATGTTTTCGT GCACTTTGGC ATTGCACAAT CATCCACAAG AGGGGGTAGT AGAGTTGAAG 
GAATCTTTGC AGGATTGTTT ATGAATGCCT ATGGTTCAGG GCAAGTAATG CTAAGATGGG 
GAGTTTTAGC CAAATCTGTA AAAAATATCA TGCTAGGACA TGCTAGTGTC CAGGCAGAAA 
TGGAGCAAGT TGTGGAAGTC TATGAGTATG CACAGAAGTT GGGAGGAGAA GCTGGATTCT 
ACCATATATT GAACAATCCA AAAGCATCAT TGCTGTCATT AACTCAATTT CCCAACTTCT 
CAAGTGTGGT CCTAGGCAAT GCAGCAGGTC TAGGCATAAT GGGAGAGTAT AGAGGTACAC 
CAAGAAACCA GGATCTTTAT GATGCAGCTA AAGCATATGC AGAGCAACTC AAAGAAAATG 
GAGTAATAAA CTACAGTGTA TTAGACTTAA CAGCAGAAGA ATTGGAAGCC ATAAAGCATC 
AACTCAACCC CAAAGAAGAT GATGTAGAGC TTTAAGTTAA CAAAAAATAC GGGGCAAATA 
AGTCAACATG GAGAAGTTTG CACCTGAATT TCATGGAGAA GATGCAAATA ACAAAGCTAC 
CAAATTCCTA GAATCAATAA AGGGCAAGTT CGCATCATCC AAAGATCCTA AGAAGAAAGA 
TAGCATAATA TCTGTTAACT CAATAGATAT AGAAGTAACT AAAGAGAGCC CGATAACATC 
TGGCACCAAC ATCATCAATC CAACAAGTGA AGCCGACAGT ACCCCAGAAA CAAAAGCCAA 
CTACCCAAGA AAACCCCTAG TAAGCTTCAA AGAAGATCTC ACCCCAAGTG ACAACCCTTT 
TTCTAAGTTG TACAAGGAAA CAATAGAAAC ATTTGATAAC AATGAAGAAG AATCTAGCTA 
CTCATATGAA GAGATAAATG ATCAAACAAA TGACAACATT ACAGCAAGAC TAGATAGAAT 
TGATGAAAAA TTAAGTGAAA TATTAGGAAT GCTCCATACA TTAGTAGTTG CAAGTGCAGG 
ACCCACTTCA GCTCGCGATG GAATAAGAGA TGCTATGGTT GGTCTAAGAG AAGAGATGAT 
AGAAAAAATA AGAGCGGAAG CATTAATGAC CAATGATAGG TTAGAGGCTA TGGCAAGACT 
TAGGAATGAG GAAAGCGAAA AAATGGCAAA AGACACCTCA GATGAAGTGT CTCTTAATCC 
AACTTCCAAA AAATTGAGTG ACTTGTTGGA AGACAACGAT AGTGACAATG ATCTATCACT 
TGATGATTTT TGATCAGCGA TCAACTCACT CAGCAATCAA CAACATCAAT AAAACAGACA 
TCAATCCATT GAATCAACTG CCAGACCGAA CAAACAAACG TCCATCAGTA GAACCACCAA 
CCAATCAATC AACCAATTGA TCAATCAGCA ACCCGACAAA ATTAACAATA TAGTAACAAA 
AAAAGAACAA GATGGGG CAA ATATGGAAAC ATACGTGAAC AAGCTTCACG AAGGCTCCAC 
ATACACAGCA GCTGTTCAGT ACAATGTTCT AGAAAAAGAT GATGATCCTG CATCACTAAC 
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AATATGG6TG CCTATGTTCC AGTCATCTGT GCCAGCAGAC TTGCTCATAA AAGAACTTGC 
AAGCAT CAAT ATACTAGTGA AGCAGATCTC TACGCCCAAA GGACCTTCAC TACGAGTCAC 
GATTAACTCA AGAAGTGCTG TGCTGGCTCA AATGCCTAGT AATTTCATCA TAAGCGCAAA 
TGTATCATTA GATGAAAGAA GCAAATTAGC ATATGATGTA ACTACACCTT GTGAAATCAA 
AGCATGCAGT CTAACATGCT TAAAAGTAAA AAGTATGTTA ACTACAGTCA AAGATCTTAC 
CATGAAGACA TTCAACCCCA CTCATGAGAT CATTGCTCTA TGTGAATTTG AAAATATTAT 
GACATCAAAA AGAGTAATAA TACCAACCTA TCTAAGATCA ATTAGTGTCA AGAACAAGGA 
TCTGAACTCA CTAGAAAATA TAGCAACCAC CGAATTCAAA AATGCTATCA CCAATGCAAA 
AATTATTCCT TATGCAGGAT TAGTGTTAGT TATCACAGTT ACTGACAATA AAGGAGCATT 
CAAATATATC AAACCACAGA GTCAATTTAT AGTAGATCTT GGTGCCTACC TAGAAAAAGA 
GAGCATATAT TATGTGACTA CTAATTGGAA GCATACAGCT ACACGTTTTT CAATCAAACC 
ACTAGAGGAT TAAACTTAAT TATCAACACT GAATGACAGG TCCACATATA TCCTCAAACT 
ACACACTATA TCCAAACATC ATAAACATCT ACACTACACA CTTCATCACA CAAACCAATC 
CCACTCAAAA TCCAAAATCA CTACCAGCCA CTATCTGCTA GACCTAGAGT GCGAATAGGT 
AAATAAAACC AAAATATGGG GTAAATAGAC ATTAGTTAGA GTTCAATCAA TCTTAACAAC 
CATTTATACC GCCAATTCAA CACATATACT ATAAATCTTA AAATGGGAAA TACATCCATC 
ACAATAGAAT TCACAAGCAA ATTTTGGCCC TATTTTACAC TAATACATAT GATCTTAACT 
CTAATCTTTT TACTAATTAT AATCACTATT ATGATTGCAA TACTAAATAA GCTAAGTGAA 
CATAAAGCAT TCTGTAACAA AACTCTTGAA CTAGGACAGA TGTATCAAAT CAACACATAG 
AGTTCTACCA TTATGCTGTG TCAAATTATA ATCCTGTATA TATAAACAAA CAAATCCAAT 
CTTCTCACAG AGTCATGGTG TCGCAAAACC ACGCTAACTA TCATGGTAGC ATAGAGTAGT 
TATTTAAAAA TTAACATAAT GATGAATTGT TAGTATGAGA TCAAAAACAA CATTGGGGCA 
AATGCAACCA TGTCCAAACA CAAGAATCAA CGCACTGCCA GGACTCTAGA AAAGACCTGG 
GATACTCTTA ATCATCTAAT TGTAATATCC TCTTGTTTAT ACAGATTAAA TTTAAAATCT 
ATAGCACAAA T AGCACT AT C AGTTTTGGCA ATGATAATCT CAACCTCTCT CAT AAT T GCA 
GCCATAATAT TCATCATCTC TGCCAATCAC AAAGTTACAC TAACAACGGT CACAGTTCAA 
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ACAATAAAAA ACCACACTGA AAAAAACATC ACCACCTACC CTACTCAAGT CTCACCAGAA 
AGGGTTAGTT CATCCAAGCA ACCCACAACC ACATCACCAA TCCACACAAG TTCAGCTACA 
ACATCACCCA ATACAAAATC AGAAACACAC CATACAACAG CACAAACCAA AGGCAGAACC 
ACCACTTCAA CACAGACCAA CAAGCCAAGC ACAAAACCAC GTCCAAAAAA TCCACCAAAA 
AAAGATGATT ACCATTTTGA AGTGTTCAAC TTCGTTCCCT GCAGTATATG TGGCAACAAT 
CAACTTTGCA AATCCATCTG CAAAACAATA CCAAGCAACA AACCAAAGAA GAAACCAACC 
ATCAAACCCA CAAACAAACC AACCACCAAA ACCACAAACA AAAGAGACCC AAAAACACCA 
GCCAAAACGA CGAAAAAAGA AACTACCACC AACCCAACAA AAAAACTAAC CCTCAAGACC 
ACAGAAAGAG ACACCAGCAC CTCACAATCC ACTGCACTCG ACACAACCAC ATTAAAACAC 
ACAGTCCAAC AGCAATCCCT CCTCTCAACC ACCCCCGAAA ACACACCCAA CTCCACACAA 
ACACCCACAG CATCCGAGCC CTCCACACCA AACTCCACCC AAAAAACCCA GCCACATGCT 
TAGTTATTCA AAAACTACAT CTTAGCAGAG AACCGTGATC TATCAAGCAA GAACGAAATT 
AAACCTGGGG CAAATAACCA TGGAGTTGAT GATCCACAAG TCAAGTGCAA TCTTCCTAAC 
TCTTGCTATT AATGCATTGT ACCTCACCTC AAGTCAGAAC ATAACTGAGG AGTTTTACCA 
ATCGACATGT AGTGCAGTTA GCAGAGGTTA TTTTAGTGCT TTAAGAACAG GTTGGTATAC 
TAGTGTCATA ACAATAGAAT TAAGTAATAT AAAAGAAACC AAATGCAATG GAACTGACAC 
TAAAGTAAAA CTTATGAAAC AAGAATTAGA TAAGTATAAG AATGCAGTAA CAGAATTACA 
GCTACTTATG CAAAACACAC CAGCTGTCAA CAACCGGGCC AGAAGAGAAG CACCACAGTA 
TATGAACTAC ACAATCAATA CCACTAAAAA CCTAAATGTA TCAATAAGCA AGAAGAGGAA 
ACGAAGATTT CTAGGCTTCT TGTTAGGTGT GGGATCTGCA ATAGCAAGTG GTATAGCTGT 
ATCAAAAGTT CTACACCTTG AAGGAGAAGT GAACAAGATC AAAAATGCTT TGTTGTCTAC 
AAACAAAGCT GTAGTCAGTT TATCAAATGG GGTCAGTGTT TTAACCAGCA AAGTGTTAGA 
TCTCAAGAAT TACATAAATA ACCAATTATT ACCCATAGTA AATCAACAGA GCTGTCGCAT 
CTCCAACATT GAAACAGTTA TAGAATTCCA GCAGAAGAAC AGCAGATTGT TGGAAATCAC 
CAGAGAATTT AGTGTCAATG CAGGTGTAAC AACACCTTTA AGCACTTACA TGTTGACAAA 
CAGTGAGTTA CTATCATTAA TCAATGATAT GCCTATAACA AATGATCAGA AAAAATTAAT 
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GTCAAGCAAT GTTCAGATAG TAAGGCAACA AAGTTATTCC ATCATGTCTA TAATAAAGGA 
AGAAGTCCTT GCATATGTTG TACAGCTGCC TATCTATGGT GTAATAGATA CACCTTGCTG 
GAAATTGCAC ACATCGCCTC TATGCACTAC C AACAT C AAA GAAGGATCAA ATATTTGTTT 
AACAAGGACT GATAGAGGAT GGTATTGTGA TAATGCAGGA TCAGTATCCT TCTTTCCACA 
GGCTGACACT TGTAAAGTAC AGTCCAATCG AGTATTTTGT GACACTATGA ACAGTTTGAC 
ATTACCAAGT GAAGTCAGCC TTTGTAACAC TGACATATTC AAT TCCAAGT ATGACTGCAA 
AATTATGACA TCAAAAACAG ACATAAGCAG CTCAGTAATT ACTTCTCTTG GAGCTATAGT 
GTCATGCTAT GGTAAAACTA AATGCACTGC ATCCAACAAA AATCGTGGGA TTATAAAGAC 
ATTTTCTAAT GGTTGTGACT ATGTGTCAAA CAAAGGAGTA GATACTGTGT CAGTGGGCAA 
CACTTTATAC TATGTAAACA AGCTGGAAGG CAAGAACCTT TATGTAAAAG GGGAACCTAT 
AATAAATTAC TATGACCCTC TAGTGTTTCC TTCTGATGAG TTTGATGCAT CAATATCTCA 
AGTCAATGAA AAAATCAATC AAAGTTTAGC TTTTATTCGT AGATCTGATG AATTACTACA 
TAATGTAAAT ACTGGCAAAT CTACTACAAA TATTATGATA ACTACAATTA TTATAGTAAT 
CATTGTAGTA TTGTTATCAT TAATAGCTAT TGGTTTACTG TTGTATTGTA AAGCCAAAAA 
CACACCAGTT ACACTAAGCA AAGACCAACT AAGTGGAATC AATAATATTG CATTCAGCAA 
ATAGACAAAA AACCACCTGA TCATGTTTCA ACAACAATCT GCTGACCACC AATCCCAAAT 
CAACTTACAA CAAATATTTC AACATCACAG TACAGGCTGA ATCATTTCCT CACATCATGC 
TACCCACATA ACTAAGCTAG ATCCTTAACT TATAGTTACA TAAAAACCTC AAGTATCACA 
ATCAACCACT AAATCAACAC ATCATTCACA AAATTAACAG CTGGGGCAAA TATGTCGCGA 
AGAAATCCTT GTAAATTTGA GATTAGAGGT CATTGCTTGA ATGGTAGAAG ATGTCACTAC 
AGTCATAATT ACTTTGAATG GCCTCCTCAT GCATTACTAG TGAGGCAAAA CTTCATGTTA 
AACAAGATAC TCAAGTCAAT GGACAAAAGC ATAGACACTT TGTCTGAAAT AAGTGGAGCT 
GCTGAACTGG ATAGAACAGA AGAATATGCT CTTGGTATAG TTGGAGTGCT AGAGAGTTAC 
ATAGGATCTA TAAACAACAT AACAAAACAA TCAGCATGTG TTGCTATGAG TAAACTTCTT 
ATTGAGATCA ATAGTGATGA CATTAAAAAG CTTAGAGATA ATGAAGAACC CAATTCACCT 
AAGATAAGAG TGTACAATAC TGTTATATCA TACATTGAGA GCAATAGAAA AAACAACAAG 
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CAAACCATCC ATCTGCTCAA GA6ACTACCA GCAGACGTGC TGAAGAAGAC AATAAAGAAC 
ACATTAGATA TCCACAAAAG CATAACCATA AGCAATCCAA AAGAGTCAAC TGTGAATGAT 
CAAAATGACC AAACCAAAAA TAATGATATT ACCGGATAAA TATCCTTGTA GTATATCATC 
CATATTGATC TCAAGTGAAA GCATGGTTGC TACATTCAAT CATAAAAACA TATTACAATT 
TAACCATAAC TATTTGGATA ACCACCAGCG TTTATTAAAT CATATATTTG ATGAAATTCA 
TTGGACACCT AAAAACTTAT TAGATGCCAC TCAACAATTT CTCCAACATC TTAACATCCC 
TGAAGATATA TATACAGTAT ATATATTAGT GTCATAATGC TTGACCATAA CGACTCTATG 
TCATCCAACC ATAAAACTAT TTTGATAAGG TTATGGGACA AAATGGATCC CATTATTAAT 
GGAAACTCTG CTAATGTGTA TCTAACTGAT AGTTATTTAA AAGGTGTTAT CTCTTTTTCA 
GAGTGTAATG CTTTAGGGAG TTATCTTTTT AACGGCCCTT ATCTTAAAAA TGATTACACC 
AACTTAATTA GTAGACAAAG CCCACTACTA GAGCATATGA ATCTTAAAAA ACTAACTATA 
ACACAGTCAT TAATATCTAG ATATCATAAA GGTGAACTGA AATTAGAAGA ACCAACTTAT 
TTCCAGTCAT TACTTATGAC ATATAAAAGT ATGTCCTCGT CTGAACAAAT TGCTACAACT 
AACTTACTTA AAAAAATAAT ACGAAGAGCC ATAGAAATAA GTGATGTAAA GGTGTACGCC 
ATCTTGAATA AACTAGGATT AAAGGAAAAG GACAGAGTTA AGCCCAACAA TAATTCAGGT 
GATGAAAACT CAGTACTTAC AACTATAATT AAAGATGATA TACTTTCGGC TGTGGAAAAC 
AATCAATCAT ATACAAATTC AGACAAAAGT CACTCAGTAA ATCAAAATAT CACTATCAAA 
ACAACACTCT TGAAAAAATT GATGTGTTCA ATGCAACATC CTCCATCATG GTTAATACAC 
TGGTTCAATT TATATACAAA ATTAAATAAC ATATTAACAC AATATCGATC AAATGAGGTA 
AAAAGTCATG GGTTTATATT AATAGATAAT CAAACTTTAA GTGGTTTTCA GTTTATTTTA 
AATCAATATG GTTGTATCGT TTATCATAAA GGACTCAAAA AAATCACAAC TACTACTTAC 
AATCAATTTT TGACATGGAA AGACATCAGC CTTAGCAGAT TAAATGTTTG CTTAATTACT 
TGGATAAGTA ATTGTTTAAA TACATTAAAC AAAAGCTTAG GGCTGAGATG TGGATTCAAT 
AATGTTGTGT TATCACAATT ATTTCTTTAT GGAGATTGTA TACTGAAATT ATTTCATAAT 
GAAGGCTTCT ACATAATAAA AGAAGTAGAG GGATTTATTA TGTCTTTAAT TCTAAACATA 
ACAGAAGAAG ATCAATTTAG GAAACGATTT TATAATAGCA TGCTAAATAA CATCACAGAT 
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GCAGCTATTA AGGCTCAAAA GGACCTACTA TCAAGAGTAT GTCACACTTT ATTAGACAAG 
ACAGTGTCTG ATAATATCAT AAATGGTAAA TGGATAATCC TATTAAGTAA ATTTCTTAAA 
TTGATTAAGC TTGCAGGTGA TAATAATCTC AATAACTTGA GTGAGCTATA TTTTCTCTTC 
AGAATCTTTG GACATCCAAT GGTCGATGAA AGACAAGCAA TGGATTCTGT AAGAATTAAC 
TGTAATGAAA CTAAGTTCTA CTTATTAAGT AGTCTAAGTA CATTAAGAGG TGCTTTCATT 
TATAGAATCA TAAAAGGGTT TGTAAATACC TACAACAGAT GGCCCACCTT AAGGAATGCT 
ATTGTCCTAC CTCTAAGATG GTTAAACTAC TATAAACTTA ATACTTATCC ATCTCTACTT 
GAAATCACAG AAAATGATTT GATTATTTTA TCAGGATTGC GGTTCTATCG TGAGTTTCAT 
CTGCCTAAAA AAGTGGATCT TGAAATGATA ATAAATGACA AAGCCATTTC ACCTCCAAAA 
GATCTAATAT GGACTAGTTT TCCTAGAAAT TACATGCCAT CACATATACA AAATTATATA 
GAACATGAAA AGTTGAAGTT CTCTGAAAGC GACAGATCGA GAAGAGTACT AGAGT AT T AC 
TTGAGAGATA ATAAATTCAA TGAATGCGAT CTATACAATT GTGTAGTCAA TCAAAGCTAT 
CTCAACAACT CTAATCACGT GGTATCACTA ACTGGTAAAG AAAGAGAGCT CAGTGTAGGT 
AGAATGTTTG CTATGCAACC AGGTATGTTT AGGCAAATCC AAATCTTAGC AGAGAAAATG 
ATAGCTGAAA ATATTTTACA ATTCTTCCCT GAGAGTTTGA CAAGATATGG TGATCTAGAG 
CTTCAAAAGA TATTAGAATT AAAAGCAGGA ATAAGCAACA AGTCAAATCG TTATAATGAT 
AACTACAACA ATTATATCAG TAAATGTTCT ATCATTACAG ATCTTAGCAA ATTCAATCAG 
GCATTTAGAT ATGAAACATC ATGTATCTGC AGTGATGTAT TAGATGAACT GCATGGAGTA 
CAATCTCTGT TCTCTTGGTT GCATTTAACA ATACCTCTTG TCACAATAAT ATGTACATAT 
AGACATGCAC CTCCTTTCAT AAAGGATCAT GTTGTTAATC TTAATGAGGT TGATGAACAA 
AGTGGATTAT ACAGATATCA TATGGGTGGT ATTGAGGGCT GGTGTCAAAA ACTGTGGACC 
ATTGAAGCTA TATCATTATT AGATCTAATA TCTCTCAAAG GGAAATTCTC TATCACAGCT 
CTGATAAATG GTGATAATCA GTCAATTGAT ATAAGCAAAC CAGTTAGACT TATAGAGGGT 
CAGACCCATG CACAAGCAGA TTATTTGTTA GCATTAAATA GCCTTAAATT GTTATATAAA 
GAGTATGCAG GTATAGGCCA TAAGCTTAAG GGAACAGAGA CCTATATATC CCGAGATATG 
CAGTTCATGA GCAAAACAAT CCAGCACAAT GGAGTGTACT ATCCAGCCAG TATCAAAAAA 
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6TCCTGA6AG TAGGTCCATG GATAAACACG ATACTTGATG ATTTTAAAGT TAGTTTAGAA 
TCTATAGGCA GCTTAACACA GGAGTTAGAA TACAGAGGAG AAAGCTTATT ATGCAGTTTA 
ATATTTAGGA ACATTTGGTT ATACAATCAA ATTGCTTTGC AACTCCGAAA TCATGCATTA 
TGTAACAATA AGCTATATTT AGATATATTG AAAGTATTAA AACACTTAAA AACTTTTTTT 
AATCTTGATA GCATTGATAT GGCTTTATCA TTGTATATGA ATTTGCCTAT GCTGTTTGGT 
GGTGGTGATC CTAATTTGTT ATATCGAAGC TTTTATAGGA GAACTCCAGA CTTCCTTACA 
GAAGCTATAG TACATTCAGT GTTTGTGTTG AGCTATTATA CTGGTCACGA TTTACAAGAT 
AAGCTCCAGG ATCTTCCAGA TGATAGACTG AACAAATTCT TGACATGTGT CATCACATTT 
GATAAAAATC CCAATGCCGA GTTTGTAACA TTGATGAGGG ATCCACAGGC TTTAGGGTCT 
GAAAGGCAAG CTAAAATTAC TAGTGAGATT AATAGATTAG CAGTAACAGA AGTCTTAAGT 
ATAGCCCCAA ACAAAATATT TTCTAAAAGT GCACAACATT ATACTACCAC TGAGATTGAT 
CTAAATGACA TTATGCAAAA TATAGAACCA ACTTACCCTC ATGGATTAAG AGTTGTTTAT 
GAAAGTTTAC CTTTTTATAA AGCAGAAAAA ATAGTTAATC TTATATCAGG AACAAAATCC 
ATAACTAATA TACTTGAAAA AACATCAGCA ATAGATACAA CTGATATTAA TAGGGCTACT 
GATATGATGA GGAAAAATAT AACTTTACTT ATAAGGATAC TTCCACTAGA TTGTAACAAA 
GACAAAAGAG AGTTATTAAG TTTAGAAAAT CTTAGTATAA CTGAATTAAG CAAGTATGTA 
AGAGAAAGAT CTTGGTCATT ATCCAATATA GTAGGAGTAA CATCGCCAAG TATTATGTTC 
ACAATGGACA TTAAATATAC AACTAGCACT ATAGCCAGTG GTATAATAAT AGAAAAATAT 
AATGTTAATA GTTTAACTCG TGGTGAAAGA GGACCCACCA AGCCATGGGT AGGCTCATCC 
ACGCAGGAGA AAAAAACAAT GCCAGTGTAC AACAGACAAG TTTTAACCAA AAAGCAAAGA 
GACCAAATAG ATTTATTAGC AAAATTAGAC TGGGTATATG CATCCATAGA CAACAAAGAT 
GAATTCATGG AAGAACTGAG TACTGGAACA CTTGGACTGT CATATGAAAA AGCCAAAAAG 
TTGTTTCCAC AATATCTAAG TGTCAATTAT TTACACCGTT TAACAGTCAG TAGTAGACCA 
TGTGAATTCC CTGCATCAAT ACCAGCTTAT AGAACAACAA ATTATCATTT TGATACTAGT 
CCTATCAATC ATGTATTAAC AGAAAAGTAT GGAGATGAAG ATATCGACAT TGTGTTTCAA 
AATTGCATAA GTTTTGGTCT TAGCCTGATG TCGGTTGTGG AACAATTCAC AAACATATGT 
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CCTAATAGAA TTATTCTCAT ACCGAAGCTG AATGAGATAC ATTTGATGAA ACCTCCTATA 
TTTACAGGAG ATGTTGATAT CATCAAGTTG AAGCAAGTGA TACAAAAGCA GCACATGTTC 
CTACCAGATA AAATAAGTTT AACCCAATAT GTAGAATTAT TCTTAAGTAA CAAAGCACTT 
AAATCTGGAT CTCACATCAA CTCTAATTTA ATATTAGTAC ATAAAATGTC TGATTATTTT 
CATAATGCTT ATATTTTAAG TACTAATTTA GCTGGACATT GGATTCTGAT TATTCAACTT 
ATGAAAGATT CAAAAGGTAT TTTTGAAAAA GATTGGGGAG AGGGGTACAT AA CTGAT CAT 
ATGTTCATTA ATTTGAATGT TTTCTTTAAT GCTTATAAGA CTTATTTGCT ATGTTTTCAT 
AAAGGTTATG GTAAAGCAAA ATTAGAATGT GATATGAACA CTTCAGATCT TCTTTGTGTT 
TTGGAGTTAA TAGACAGTAG CTACTGGAAA TCTATGTCTA AAGTTTTCCT AGAACAAAAA 
GTCATAAAAT ACATAGTCAA TCAAGACACA AGTTTGCGTA GAATAAAAGG CTGTCACAGT 
TTTAAGTTGT GGTTTTTAAA ACGCCTTAAT AATGCTAAAT TTACCGTATG CCCTTGGGTT 
GTTAACATAG ATTATCACCC AACACACATG AAAGCTATAT TATCTTACAT AGATTTAGTT 
AGAATGGGGT TAATAAATGT AGATAAATTA ACCATTAAAA ATAAAAACAA ATTCAATGAT 
GAATTTTACA CATCAAATCT CTTTTACATT AGTTATAACT TTT CAGACAA CACTCATTTG 
CTAACAAAAC AAATAAGAAT TGCTAATTCA GAATTAGAAG ATAATTATAA CAAACTATAT 
CACCCAACCC CAGAAACTTT AGAAAATATG TCATTAATTC CTGTTAAAAG TAATAATAGT 
AACAAACCTA AATTTTGTAT AAGTGGAAAT ACCGAATCTA TGATGATGTC AACATTCTCT 
AGTAAAATGC ATATTAAATC TTCCACTGTT ACCACAAGAT TCAATTATAG CAAACAAGAC 
TTGTACAATT TATTTCCAAT TGTTGTGATA GACAAGATTA TAGATCATTC AGGTAATACA 
GCAAAATCTA ACCAACTTTA CACCACCACT TCACATCAGA CATCTTTAGT AAGGAATAGT 
GCATCACTTT ATTGCATGCT TCCTTGGCAT CATGTCAATA GATTTAACTT TGTATTTAGT 
TCCACAGGAT GCAAGATCAG TATAGAGTAT ATTTTAAAAG ATCTTAAGAT TAAGGACCCC 
AGTTGTATAG CATTCATAGG TGAAGGAGCT GGTAACTTAT TATTACGTAC GGTAGTAGAA 
CTTCATCCAG ACATAAGATA CATTTACAGA AGTTTAAAAG ATTGCAATGA TCATAGTTTA 
CCTATTGAAT TTCTAAGGTT ATACAACGGG CATATAAACA TAGATTATGG TGAGAATTTA 
ACCATTCCTG CTACAGATGC AACTAATAAC ATTCATTGGT CTTATTTACA TATAAAATTT 
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GCAGAACCTA TTAGCATCTT TGTCTGCGAT GCTGAATTAC CTGTTACAGC CAATTGGAGT 
AAAATTATAA TTGAATGGAG TAAGCATGTA AGAAAGTGCA AGTACTGTTC TTCTGTAAAT 
AGATGCATTT TAATTGCAAA ATATCATGCT CAAGATGACA TTGATTTCAA ATTAGATAAC 
ATTACTATAT TAAAAACTTA CGTGTGCCTA GGTAGCAAGT TAAAAGGATC TGAAGTTTAC 
TTAATCCTTA CAATAGGCCC TGCAAATATA CTTCCTGTTT TTGATGTTGT ACAAAATGCT 
AAATTGATAC TTTCAAGAAC TAAAAATTTC ATTATGCCTA AAAAAACTGA CAAGGAATCT 
ATCGATGCAG ATATTAAAAG CTTAATACCT TTCCTTTGTT ACCCTATAAC AAAAAAAGGA 
ATTAAGACTT CATTGTCAAA ATTGAAGAGT GTAGTTAATG GAGATATATT ATCATATTCT 
ATAGCTGGAC GTAATGAAGT ATTCAGCAAC AAGCTTATAA ACCACAAGCA TATGAATATC 
CTAAAATGGC TAGATCATGT TTTAAATTTT AGATCAGCTG AACTTAATTA CAATCATTTA 
TACATGATAG AGTCCACATA TCCTTACTTA AGTGAATTGT TAAATAGTTT AACAACCAAT 
GAGCTCAAGA AGCTGATTAA AATAACAGGT AGTGTGCTAT ACAACCTTCC CAACGAACAG 
TAGTTTAAAA TATCATTAAC AAGTTTGGTC AAATTTAGAT GCTAACACAT CATTATATTA 
TAGTTATTAA AAAATATACA AACTTTTCAA TAATTTAGCA TATTGATTCC AAAATTATCA 
TTTTAGTCTT AAGGGGTTAA ATAAAAGTCT AAAACTAACA ATTATACATG TGCATTCACA 
ACACAACGAG ACATTAGTTT TTGACACTTT TTTTCTCGT 
(2) INFORMATION FOR SEQ ID NO: 30: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2166 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 30: 

Met Asp Pro lie lie Asn Gly Asn Ser Ala Asn Val Tyr Leu Thr * 
15 10 15 

Ser Tyr Leu Lys Gly Val lie Ser Phe Ser Glu Cys Asn Ala Leu C 
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20 25 30 

Ser Tyx Leu Phe Asn Gly Pro Tyr Leu Lys Asn Asp Tyr Thr Asn Leu 
35 40 ~ 45 

lie Ser Arg Gin Ser Pro Leu Leu Glu His Met Asn Leu Lys Lys Leu 
50 55 60 

Thr He Thr Gin Ser Leu He Ser Arg Tyr His Lys Gly Glu Leu Lys 
65 70 75 80 

Leu Glu Glu Pro Thr Tyr Phe Gin Ser Leu Leu Met Thr Tyr Lys Ser 
85 90 95 

Met Ser Ser Ser Glu Gin He Ala Thr Thr Asn Leu Leu Lys Lys He 
100 105 110 

He Arg Arg Ala He Glu He Ser Asp Val Lys Val Tyr Ala He Leu 
115 120 125 

Asn Lys Leu Gly Leu Lys Glu Lys Asp Arg Val Lys Pro Asn Asn Asn 
130 135 140 

Ser Gly Asp Glu Asn Ser Val Leu Thr Thr He He Lys Asp Asp He 
145 150 155 160 

Leu Ser Ala Val Glu Asn Asn Gin Ser Tyr Thr Asn Ser Asp Lys Ser 
165 170 175 

His Ser Val Asn Gin Asn He Thr He Lys Thr Thr Leu Leu Lys Lys 
180 185 190 

Leu Met Cys Ser Met Gin His Pro Pro Ser Trp Leu He His Trp Phe 
195 200 205 

Asn Leu Tyr Thr Lys Leu Asn Asn He Leu Thr Gin Tyr Arg Ser Asn 
210 215 220 

Glu Val Lys Ser His Gly Phe He Leu He Asp Asn Gin Thr Leu Ser 
225 230 235 240 

Gly Phe Gin Phe He Leu Asn Gin Tyr Gly Cys He Val Tyr His Lys 
245 250 255 

Gly Leu Lys Lys He Thr Thr Thr Thr Tyr Asn Gin Phe Leu Thr Trp 
260 265 270 

Lys Asp He Ser Leu Ser Arg Leu Asn Val Cys Leu He Thr Trp He 
275 280 285 

Ser Asn Cys Leu Asn Thr Leu Asn Lys Ser Leu Gly Leu Arg Cys Gly 
290 295 300 



SUBSTITUTE SHEET (RULE 26) 



WO 98/13501 PCT/US97/16718 



358 



Phe Asn Asn Val Val Leu Ser Gin Leu Phe Leu Tyr Gly Asp Cys lie 
305 310 315 320 

Leu Lye Leu Phe His Asn Glu Gly Phe Tyr lie lie Lys Glu Val Glu 
325 330 335 

Gly Phe lie Met Ser Leu lie Leu Asn He Thr Glu Glu Asp Gin Phe 
340 345 350 

Arg Lys Arg Phe Tyr Asn Ser Met Leu Asn Asn He Thr Asp Ala Ala 
355 360 365 

He Lys Ala Gin Lys Asp Leu Leu Ser Arg Val Cys His Thr Leu Leu 
370 375 380 

Asp Lys Thr Val Ser Asp Asn He He Asn Gly Lys Trp He He Leu 
385 390 395 400 

Leu Ser Lys Phe Leu Lys Leu He Lys Leu Ala Gly Asp Asn Asn Leu 
405 410 415 

Asn Asn Leu Ser Glu Leu Tyr Phe Leu Phe Arg He Phe Gly His Pro 
420 425 430 

Met Val Asp Glu Arg Gin Ala Met Asp Ser Val Arg He Asn Cys Asn 
435 440 445 

Glu Thr Lys Phe Tyr Leu Leu Ser Ser Leu Ser Thr Leu Arg Gly Ala 
450 455 460 

Phe He Tyr Arg He He Lys Gly Phe Val Asn Thr Tyr Asn Arg Trp 
465 470 475 480 

Pro Thr Leu Arg Asn Ala He Val Leu Pro Leu Arg Trp Leu Asn Tyr 
485 490 495 

Tyr Lys Leu Asn Thr Tyr Pro Ser Leu Leu Glu He Thr Glu Asn Asp 
500 505 510 

Leu He He Leu Ser Gly Leu Arg Phe Tyr Arg Glu Phe His Leu Pro 
515 520 525 

Lys Lys Val Asp Leu Glu Met He He Asn Asp Lys Ala He Ser Pro 
530 535 540 

Pro Lys Asp Leu He Trp Thr Ser Phe Pro Arg Asn Tyr Met Pro Ser 
545 550 555 560 

His He Gin Asn Tyr He Glu His Glu Lys Leu Lys Phe Ser Glu Ser 
565 570 575 
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Asp Arg Ser Arg Arg Val Leu Glu Tyr Tyr Leu Arg Asp Asn Lys Phe 
580 585 590 

Asn Glu Cys Asp Leu Tyr Asn Cys Val Val Asn Gin Ser Tyr Leu Asn 
595 600 605 

Asn Ser Asn His Val Val Ser Leu Thr Gly Lys Glu Arg Glu Leu Ser 
610 615 620 

Val Gly Arg Met Phe Ala Met Gin Pro Gly Met Phe Arg Gin lie Gin 
625 630 635 640 



Ala Glu Asn He Leu Gin Phe Phe Pro 
650 655 



He Leu Ala Glu Lys Met He 
645 

Glu Ser Leu Thr Arg Tyr Gly Asp 
660 

Leu Lys Ala Gly He Ser Asn Lys 
675 680 



Leu Glu Leu Gin Lys He Leu Glu 
665 670 

Ser Asn Arg Tyr Asn Asp Asn Tyr 
685 



Asn Asn Tyr He Ser Lys Cys Ser He He Thr Asp Leu Ser Lys Phe 
690 695 700 

Asn Gin Ala Phe Arg Tyr Glu Thr Ser Cys He Cys Ser Asp Val Leu 
705 710 715 720 

Asp Glu Leu His Gly Val Gin Ser Leu Phe Ser Trp Leu His Leu Thr 
725 730 735 

He Pro Leu Val Thr He He Cys Thr Tyr Arg His Ala Pro Pro Phe 
740 745 750 

He Lys Asp His Val Val Asn Leu Asn Glu Val Asp Glu Gin Ser Gly 
755 760 765 

Leu Tyr Arg Tyr His Met Gly Gly He Glu Gly Trp Cys Gin Lys Leu 
770 775 780 

Trp Thr He Glu Ala He Ser Leu Leu Asp Leu He Ser Leu Lys Gly 
785 790 795 800 

Lys Phe Ser He Thr Ala Leu He Asn Gly Asp Asn Gin Ser He Asp 
805 810 815 

He Ser Lys Pro Val Arg Leu He Glu Gly Gin Thr His Ala Gin Ala 
820 825 830 

Asp Tyr Leu Leu Ala Leu Asn Ser Leu Lys Leu Leu Tyr Lys Glu Tyr 
835 840 845 

Ala Gly He Gly His Lys Leu Lys Gly Thr Glu Thr Tyr He Ser Arg 
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850 855 860 

Asp Met Gin Phe Met Ser Lys Thr lie Gin His Asn Gly Val Tyr Tyr 
865 870 875 880 

Pro Ala Ser lie Lys Lys Val Leu Arg Val Gly Pro Trp lie Asn Thr 
885 890 895 

He Leu Asp Asp Phe Lys Val Ser Leu Glu Ser He Gly Ser Leu Thr 
900 905 910 

Gin Glu Leu Glu Tyr Arg Gly Glu Ser Leu Leu Cys Ser Leu He Phe 
915 920 925 

Arg Asn He Trp Leu Tyr Asn Gin He Ala Leu Gin Leu Arg Asn His 
930 935 940 

Ala Leu Cys Asn Asn Lys Leu Tyr Leu Asp He Leu Lys Val Leu Lys 
945 950 955 960 

His Leu Lys Thr Phe Phe Asn Leu Asp Ser He Asp Met Ala Leu Ser 
965 970 975 

Leu Tyr Met Asn Leu Pro Met Leu Phe Gly Gly Gly Asp Pro Asn Leu 
980 985 990 

Leu Tyr Arg Ser Phe Tyr Arg Arg Thr Pro Asp Phe Leu Thr Glu Ala 
995 1000 1005 



He Val His Ser Val Phe Val Leu 
1010 1015 

Gin Asp Lys Leu Gin Asp Leu Pro 
1025 1030 



Ser Tyr Tyr Thr Gly His Asp Leu 
1020 

Asp Asp Arg Leu Asn Lys Phe Leu 
1035 1040 



Thr Cys Val He Thr Phe Asp Lys Asn Pro Asn Ala Glu Phe Val Thr 
1045 1050 1055 

Leu Met Arg Asp Pro Gin Ala Leu Gly Ser Glu Arg Gin Ala Lys He 
1060 1065 1070 

Thr Ser Glu He Asn Arg Leu Ala Val Thr Glu Val Leu Ser He Ala 
1075 1080 1085 

Pro Asn Lys He Phe Ser Lys Ser Ala Gin His Tyr Thr Thr Thr Glu 
1090 1095 1100 

He Asp Leu Asn Asp He Met Gin Asn He Glu Pro Thr Tyr Pro His 
1105 1110 1115 1120 

Gly Leu Arg Val Val Tyr Glu Ser Leu Pro Phe Tyr Lys Ala Glu Lys 
1125 1130 1135 
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lie Val Asn Leu lie Ser Gly Thr Lys Ser He Thr Asn He Leu Glu 
1140 1145 1150 

Lys Thr Ser Ala He Asp Thr Thr Asp He Asn Arg Ala Thr Asp Met 
1155 1160 1165 

Met Arg Lys Asn He Thr Leu Leu He Arg He Leu Pro Leu Asp Cys 
1170 1175 1180 

Asn Lys Asp Lys Arg Glu Leu Leu Ser Leu Glu Asn Leu Ser He Thr 
1185 1190 1195 1200 

Glu Leu Ser Lys Tyr Val Arg Glu Arg Ser Trp Ser Leu Ser Asn He 
1205 1210 1215 

Val Gly Val Thr Ser Pro Ser He Met Phe Thr Met Asp He Lys Tyr 
1220 1225 1230 

Thr Thr Ser Thr He Ala Ser Gly He He He Glu Lys Tyr Asn Val 
1235 1240 1245 

Asn Ser Leu Thr Arg Gly Glu Arg Gly Pro Thr Lys Pro Trp Val Gly 
1250 1255 1260 

Ser Ser Thr Gin Glu Lys Lys Thr Met Pro Val Tyr Asn Arg Gin Val 
1265 1270 1275 1280 

Leu Thr Lys Lys Gin Arg Asp Gin He Asp Leu Leu Ala Lys Leu Asp 
1285 1290 1295 

Trp Val Tyr Ala Ser He Asp Asn Lys Asp Glu Phe Met Glu Glu Leu 
1300 1305 1310 

Ser Thr Gly Thr Leu Gly Leu Ser Tyr Glu Lys Ala Lys Lys Leu Phe 
1315 1320 1325 

Pro Gin Tyr Leu Ser Val Asn Tyr Leu His Arg Leu Thr Val Ser Ser 
1330 1335 1340 

Arg Pro Cys Glu Phe Pro Ala Ser He Pro Ala Tyr Arg Thr Thr Asn 
1345 1350 1355 1360 

Tyr His Phe Asp Thr Ser Pro He Asn His Val Leu Thr Glu Lys Tyr 
1365 1370 1375 

Gly Asp Glu Asp He Asp He Val Phe Gin Asn Cys He Ser Phe Gly 
1380 1385 1390 

Leu Ser Leu Met Ser Val Val Glu Gin Phe Thr Asn He Cys Pro Asn 
1395 1400 1405 
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Arg lie lie Leu He Pro Lys Leu Asn Glu He His Leu Met Lys Pro 
1410 1415 1420 

Pro He Phe Thr Gly Asp Val Asp He He Lys Leu Lys Gin Val He 
1425 1430 1435 1440 

Gin Lys Gin His Met Phe Leu Pro Asp Lys He Ser Leu Thr Gin Tyr 
1445 1450 1455 

Val Glu Leu Phe Leu Ser Asn Lys Ala Leu Lys Ser Gly Ser His He 
1460 1465 1470 

Asn Ser Asn Leu He Leu Val His Lys Met Ser Asp Tyr Phe His Asn 
1475 1480 1485 

Ala Tyr He Leu Ser Thr Asn Leu Ala Gly His Trp He Leu He He 
1490 1495 1500 

Gin Leu Met Lys Asp Ser Lys Gly He Phe Glu Lys Asp Trp Gly Glu 
1505 1510 1515 1520 

Gly Tyr He Thr Asp His Met Phe He Asn Leu Asn Val Phe Phe Asn 
1525 1530 1535 

Ala Tyr Lys Thr Tyr Leu Leu Cys Phe His Lys Gly Tyr Gly Lys Ala 
1540 1545 1550 

Lys Leu Glu Cys Asp Met Asn Thr Ser Asp Leu Leu Cys Val Leu Glu 
1555 1560 1565 

Leu He Asp Ser Ser Tyr Trp Lys Ser Met Ser Lys Val Phe Leu Glu 
1570 1575 1580 

Gin Lys Val He Lys Tyr He Val Asn Gin Asp Thr Ser Leu Arg Arg 
1585 1590 1595 1600 

He Lys Gly Cys His Ser Phe Lys Leu Trp Phe Leu Lys Arg Leu Asn 
1605 1610 1615 

Asn Ala Lys Phe Thr Val Cys Pro Trp Val Val Asn He Asp Tyr His 
1620 1625 1630 

Pro Thr His Met Lys Ala He Leu Ser Tyr He Asp Leu Val Arg Met 
1635 1640 1645 

Gly Leu He Asn Val Asp Lys Leu Thr He Lys Asn Lys Asn Lys Phe 
1650 1655 1660 

Asn Asp Glu Phe Tyr Thr Ser Asn Leu Phe Tyr He Ser Tyr Asn Phe 
1665 1670 1675 1680 

Ser Asp Asn Thr His Leu Leu Thr Lys Gin He Arg He Ala Asn Ser 
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1685 1690 1695 

Glu Leu Glu Asp Asn Tyr Asn Lys Leu Tyr His Pro Thr Pro Glu Thr 
1700 1705 1710 

Leu Glu Asn Met Ser Leu He Pro Val Lys Ser Asn Asn Ser Asn Lys 
1715 1720 1725 

Pro Lys Phe Cys He Ser Gly Asn Thr Glu Ser Met Met Met Ser Thr 
1730 1735 1740 

Phe Ser Ser Lys Met His He Lys Ser Ser Thr Val Thr Thr Arg Phe 
1745 1750 1755 1760 

Asn Tyr Ser Lys Gin Asp Leu Tyr Asn Leu Phe Pro He Val Val He 
1765 1770 1775 

Asp Lys He He Asp His Ser Gly Asn Thr Ala Lys Ser Asn Gin Leu 
1780 1785 1790 

Tyr Thr Thr Thr Ser His Gin Thr Ser Leu Val Arg Asn Ser Ala Ser 
1795 1800 1805 

Leu Tyr Cys Met Leu Pro Trp His His Val Asn Arg Phe Asn Phe Val 
1810 1815 1820 

Phe Ser Ser Thr Gly Cys Lys He Ser He Glu Tyr He Leu Lys Asp 
1825 1830 1835 1840 

Leu Lys He Lys Asp Pro Ser Cys He Ala Phe He Gly Glu Gly Ala 
1845 1850 1855 

Gly Asn Leu Leu Leu Arg Thr Val Val Glu Leu His Pro Asp He Arg 
1860 1865 1870 

Tyr He Tyr Arg Ser Leu Lys Asp Cys Asn Asp His Ser Leu Pro He 
1875 1880 1885 

Glu Phe Leu Arg Leu Tyr Asn Gly His He Asn He Asp Tyr Gly Glu 
1890 1895 1900 

Asn Leu Thr He Pro Ala Thr Asp Ala Thr Asn Asn He His Trp Ser 
1905 1910 1915 1920 

Tyr Leu His He Lys Phe Ala Glu Pro He Ser He Phe Val Cys Asp 
1925 1930 1935 

Ala Glu Leu Pro Val Thr Ala Asn Trp Ser Lys He He He Glu Trp 
1940 1945 1950 

Ser Lys His Val Arg Lys Cys Lys Tyr Cys Ser Ser Val Asn Arg Cys 
1955 1960 1965 
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He Leu He Ala Lys Tyr His Ala Gin Asp Asp He Asp Phe Lys Leu 
1970 1975 1980 

Asp Asn He Thr He Leu Lys Thr Tyr Val Cys Leu Gly Ser Lys Leu 
1985 1990 1995 2000 

Lys Gly Ser Glu Val Tyr Leu He Leu Thr He Gly Pro Ala Asn He 
2005 2010 2015 

Leu Pro Val Phe Asp Val Val Gin Asn Ala Lys Leu He Leu Ser Arg 
2020 2025 2030 

Thr Lys Asn Phe He Met Pro Lys Lys Thr Asp Lys Glu Ser He Asp 
2035 * 2040 - 2045 

Ala Asp He Lys Ser Leu He Pro Phe Leu Cys Tyr Pro He Thr Lys 
2050 2055 2060 

Lys Gly He Lys Thr Ser Leu Ser Lys Leu Lys Ser Val Val Asn Gly 
2065 2070 2075 2080 

Asp He Leu Ser Tyr Ser He Ala Gly Arg Asn Glu Val Phe Ser Asn 
2085 2090 2095 

Lys Leu He Asn His Lys His Met Asn He Leu Lys Trp Leu Asp His 
2100 2105 2110 

Val Leu Asn Phe Arg Ser Ala Glu Leu Asn Tyr Asn His Leu Tyr Met 
2115 2120 2125 

He Glu Ser Thr Tyr Pro Tyr Leu Ser Glu Leu Leu Asn Ser Leu Thr 
2130 2135 2140 

Thr Asn Glu Leu Lys Lys Leu He Lys He Thr Gly Ser Val Leu Tyr 
2145 2150 2155 2160 

Asn Leu Pro Asn Glu Gin 
2165 

(2) INFORMATION FOR SEQ ID NO: 31: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15219 base pairs 

(B) TYPE: nucleic acid 

( C ) STRANDEDNES S : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31: 
AC66GAAAAA AATGCGTACT ACAAACTTGC ACATTCGAAA AAAATGGGGC AAATAAGAAC 
TTGATAAGTG CTATTTAAGT CTAACCTTTT CAATCAGAAA TGGGGTGCAA TTCACTGAGC 
ATGATAAAGG TTAGATTACA AAATTTATTT GACAATGACG AAGTAGCATT GTTAAAAATA 
ACATGTTATA CTGATAAATT AATTCTTCTG ACCAATGCAT TAGCCAAAGC AGCAATACAT 
ACAATTAAAT TAAACGGCAT AGTTTTTATA CATGTTATAA CAAGCAGTGA AGTGTGCCCT 
GATAACAATA TTGTAGTGAA ATCTAACTTT ACAACAATGC CAATACTACA AAATGGAGGA 
TACATATGGG AATTGATTGA GTTGACACAC TGCTCTCAAT TAAACGGTTT AATGGATGAT 
AATTGTGAAA TCAAATTTTC TAAAAGACTA AGTGACTCAG TAATGACTAA TTATATGAAT 
CAAATATCTG ACTTACTTGG GCTTGATCTC AATTCATGAA TTATGTTTAG TCTAATTCAA 
TAGACATGTG TTTATTACCA TTTTAGTTAA TATAAAAACT CATCAAAGGG AAATGGGGCA 
AATAAACTCA CCTAATCAAT CAAACCATGA GCACTACAAA TGACAACACT ACTATGCAAA 
GATTGATGAT CACAGACATG AGACCCCTGT CAATGGATTC AATAATAACA TCTCTTACCA 
AAGAAATCAT CACACACAAA TTCATATACT TGATAAACAA TGAATGTATT GTAAGAAAAC 
TTGATGAAAG ACAAGCTACA TTTACATTCT TAGTCAATTA TGAGATGAAG CTACTGCACA 
AAGTAGGGAG TACCAAATAC AAAAAATACA CTGAATATAA TACAAAATAT GGCACTTTCC 
CCATGCCTAT ATTTATCAAT CACGGCGGGT TTCTAGAATG TATTGGCATT AAGCCTACAA 
AACACACTCC TATAATATAC AAATATGACC TCAACCCGTG AATTCCAACA AAAAAACCAA 
CCCAACCAAA CCAAACTATT CCTCAAACAA CAGTGCTCAA TAGTTAAGAA GGAGCTAATC 
CATTTTAGTA ATTAAAAATA AAAGTAAAGC CAATAACATA AATTGGGGCA AATACAAAGA 
TGGCTCTTAG CAAAGTCAAG TTGAATGATA CATTAAATAA GGATCAGCTG CTGTCATCCA 
GCAAATACAC TATTCAACGT AGTACAGGAG ATAATATTGA CACTCCCAAT TATGATGTGC 
AAAAACACCT AAACAAACTA TGTGGTATGC TATTAATCAC TGAAGATGCA AATCATAAAT 
TCACAGGATT AATAGGTATG TTATATGCTA TGTCCAGGTT AGGAAGGGAA GACACTATAA 
AGATACTTAA AGATGCTGGA TATCATGTTA AAGCTAATGG AGTAGATATA ACAACATATC 
GTCAAGATAT AAATGGAAAG GAAATGAAAT TCGAAGTATT AACATTATCA AGCTTGACAT 
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CAGAAATACA AGTCAATATT 6AGATA6AAT CTAGAAAGTC CTACAAAAAA ATGCTAAAAG 
AGATGGGAGA AGTGGCTCCA GAATATAGGC ATGATTCTCC AGACTGTGGG ATGATAATAC 
TGTGTATAGC TGCACTTGTG ATAACCAAAT TAGCAGCAGG AGACAGATCA GGTCTTACAG 
CAGTAATTAG GAGGGCAAAC AATGTCTTAA AAAACGAAAT AAAACGATAC AAGGGCCTCA 
TACCAAAGGA TATAGCTAAC AGTTTTTATG AAGTGTTTGA AAAACACCCT CATCTTATAG 
ATGTTTTCGT GCACTTTGGC ATTGCACAAT CATCCACAAG AGGGGGTAGT AGAGTTGAAG 
GAATCTTTGC AGGATTGTTT ATGAATGCCT ATGGTTCAGG GCAAGTAATG CTAAGATGGG 
GAGTTTTAGC CAAATCTGTA AAAAATATCA TGCTAGGACA TGCTAGTGTC CAGGCAGAAA 
TGGAGCAAGT TGTGGAAGTC TATGAGTATG CACAGAAGTT GGGAGGAGAA GCTGGATTCT 
ACCATATATT GAACAATCCA AAAGCATCAT TGCTGTCATT AACTCAATTT CCCAACTTCT 
CAAGTGTGGT CCTAGGCAAT GCAGCAGGTC TAGGCATAAT GGGAGAGTAT AGAGGTACAC 
CAAGAAACCA GGATCTTTAT GATGCAGCTA AAGCATATGC AGAGCAACTC AAAGAAAATG 
GAGTAATAAA CTACAGTGTA TTAGACTTAA CAGCAGAAGA ATTGGAAGCC ATAAAGCATC 
AACTCAACCC CAAAGAAGAT GATGTAGAGC TTTAAGTTAA CAAAAAATAC GGGGCAAATA 
AGTCAACATG GAGAAGTTTG CACCTGAATT TCATGGAGAA GATGCAAATA ACAAAGCTAC 
CAAATTCCTA GAATCAATAA AGGGCAAGTT CGCATCATCC AAAGATCCTA AGAAGAAAGA 
TAGCATAATA TCTGTTAACT CAATAGATAT AGAAGTAACT AAAGAGAGCC CGATAACATC 
TGGCACCAAC ATCATCAATC CAACAAGTGA AGCCGACAGT ACCCCAGAAA CAAAAGCCAA 
CTACCCAAGA AAACCCCTAG TAAGCTTCAA AGAAGATCTC ACCCCAAGTG ACAACCCTTT 
TTCTAAGTTG TACAAGGAAA CAATAGAAAC ATTTGATAAC AATGAAGAAG AATCTAGCTA 
CTCATATGAA GAGATAAATG ATCAAACAAA TGACAACATT ACAG CAAGAC TAGATAGAAT 
TGATGAAAAA TTAAGTGAAA TATTAGGAAT GCTCCATACA TTAGTAGTTG CAAGTGCAGG 
ACCCACTTCA GCTCGCGATG GAATAAGAGA TGCTATGGTT GGTCTAAGAG AAGAGATGAT 
AGAAAAAATA AGAGCGGAAG CAT T AATG AC CAATGATAGG TTAGAGGCTA TGGCAAGACT 
TAGGAATGAG GAAAGCGAAA AAATGGCAAA AGACACCTCA GATGAAGTGT CTCTTAATCC 
AACTTCCAAA AAATTGAGTG ACTTGTTGGA AGACAACGAT AGTGACAATG ATCTATCACT 
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TGATGATTTT TGATCAGCGA TCAACTCACT CAGCAATCAA CAACATCAAT AAAACAGACA 
TCAATCCATT GAATCAACTG CCAGACCGAA CAAACAAACG TCCATCAGTA GAACCACCAA 
CCAATCAATC AACCAATTGA TCAATCAGCA ACCCGACAAA ATTAACAATA TAGTAACAAA 
AAAAGAACAA GATGGGGCAA ATATGGAAAC ATACGTGAAC AAGCTTCACG AAGGCTCCAC 
ATACACAGCA GCTGTTCAGT ACAATGTTCT AGAAAAAGAT GATGATCCTG CATCACTAAC 
AATATGGGTG CCTATGTTCC AGTCATCTGT GCCAGCAGAC TTGCTCATAA AAGAACTTGC 
AAGCATCAAT ATACTAGTGA AGCAGATCTC TACGCCCAAA GGACCTTCAC TACGAGTCAC 
GATTAACTCA AGAAGTGCTG TGCTGGCTCA AATGCCTAGT AATTTCATCA TAAGCGCAAA 
TGTATCATTA GATGAAAGAA GCAAATTAGC ATATGATGTA ACTACACCTT GTGAAATCAA 
AGCATGCAGT CTAACATGCT TAAAAGTAAA AAGTATGTTA ACTACAGTCA AAGATCTTAC 
CATGAAGACA TTCAACCCCA CTCATGAGAT CATTGCTCTA TGTGAATTTG AAAATATTAT 
GACATCAAAA AGAGTAATAA TACCAACCTA TCTAAGATCA ATTAGTGTCA AGAACAAGGA 
TCTGAACTCA CTAGAAAATA TAGCAACCAC CGAATTCAAA AATGCTATCA CCAATGCAAA 
AATTATTCCT TATGCAGGAT TAGTGTTAGT TATCACAGTT ACTGACAATA AAGGAGCATT 
CAAATATATC AAACCACAGA GTCAATTTAT AGTAGATCTT GGTGCCTACC TAGAAAAAGA 
GAGCATATAT TATGTGACTA CTAATTGGAA GCATACAGCT ACACGTTTTT CAATCAAACC 
ACTAGAGGAT TAAACTTAAT TATCAACACT GAATGACAGG TCCACATATA TCCTCAAACT 
ACACACTATA TCCAAACATC ATAAACATCT ACACTACACA CTTCATCACA CAAACCAATC 
CCACTCAAAA TCCAAAATCA CTACCAGCCA CTATCCGCTA GACCTAGAGT GCGAATAGGC 
AAATAAAACC AAAATATGGG GTAAATAGAC ATTAGTTAGA GTTCAATCAA TCTTAACAAC 
CATTTATACC GCCAATTCAA CACATATACT ATAAATCTTA AAATGGGAAA TACATCCATC 
ACAATAGAAC TCACAAGCAA ATTTTGGCCC TATTTTACAC TAATACATAT GATCTTAACT 
CTAATCTTTT TACTAATTAT AATCACTATC ATGATTGCAA CACTAAATAA GCTAAGTGAA 
CACAAAGCAT TCTGCAACAA AACTCTTGAA CTAGGACAGA TGTACCAAAT CAACACACAG 
AGTTCCACCA TTATGCTGTG T C AAA C CAT A ATCCTGTATA TACAAACAAA CAAATCCAAT 
CCTCTCACAG AGTCACGGTG TCGCAAAACC ACGCTAACCA TCATGGTAGC ATAGAGTAGT 
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TATTTAAAAA TTAACATAAT GATGAATT6T TAGTAT6A6A TCAAAAACAA CATTGGGGCA 
AATG CAACCA TGTCCAAACA CAAGAATCAA CGCACTGCCA GGACTCTAGA AAAGACCTGG 
GATACTCTTA ATCATCTAAT TGTAATATCC TCTTGTTTAT ACAGATTAAA TTTAAAATCT 
ATAGCACAAA TAGCACTATC AGTTTTGGCA ATGATAATCT CAACCTCTCT CATAATTGCA 
GCCATAATAT TCATCATCTC TGCCAATCAC AAAGTTACAC TAACAACGGT CACAGTTCAA 
ACAATAAAAA ACCACACTGA AAAAAACATC ACCACCTACC CTACTCAAGT CTCACCAGAA 
AGGGTTAGTT CATCCAAGCA ACCCACAACC ACATCACCAA TCCACACAAG TTCAGCTACA 
ACATCACCCA ATACAAAATC AGAAACACAC CATACAACAG CACAAACCAA AGGCAGAACC 
ACCACTTCAA CACAGACCAA CAAGCCAAGC ACAAAACCAC GTCCAAAAAA TCCACCAAAA 
AAAGATGATT ACCATTTTGA AGTGTTCAAC TTCGTTCCCT GCAGTATATG TGGCAACAAT 
CAACTTTGCA AATCCATCTG CAAAACAATA CCAAGCAACA AACCAAAGAA GAAACCAACC 
ATCAAACCCA CAAACAAACC AACCACCAAA AC CACAAAC A AAAGAGACCC AAAAACACCA 
GCCAAAACGA CGAAAAAAGA AACTACCACC AACCCAACAA AAAAACTAAC CCTCAAGACC 
ACAGAAAGAG ACACCAGCAC CTCACAATCC ACTGCACTCG ACACAACCAC ATTAAAACAC 
ACAGTCCAAC AGCAATCCCT CCTCTCAACC ACCCCCGAAA ACACACCCAA CTCCACACAA 
ACACCCACAG CATCCGAGCC CTCCACACCA AACTCCACCC AAAAAACCCA GCCACATGCT 
TAGTTATTCA AAAACTACAT CTTAGCAGAG AACCGTGATC TATCAAGCAA GAACGAAATT 
AAACCTGGGG CAAATAACCA TGGAGTTGAT GATCCACAAG TCAAGTGCAA TCTTCCTAAC 
TCTTGCTATT AATGCATTGT ACCTCACCTC AAGTCAGAAC ATAACTGAGG AGTTTTACCA 
ATCGACATGT AGTGCAGTTA GCAGAGGTTA TTTTAGTGCT TTAAGAACAG GTTGGTATAC 
TAGTGTCATA ACAATAGAAT TAAGTAATAT AAAAGAAACC AAATGCAATG GAACTGACAC 
TAAAGTAAAA CTTATGAAAC AAGAATTAGA TAAGTATAAG AATGCAGTAA CAGAATTACA 
GCTACTTATG CAAAACACAC CAGCTGTCAA CAACCGGGCC AGAAGAGAAG CACCACAGTA 
TATGAACTAC ACAATCAATA CCACTAAAAA CCTAAATGTA TCAATAAGCA AGAAGAGGAA 
ACGAAGATTT CTAGGCTTCT TGTTAGGTGT GGGATCTGCA ATAGCAAGTG GTATAGCTGT 
ATCAAAAGTT CTACACCTTG AAGGAGAAGT GAACAAGATC AAAAATGCTT TGTTGTCTAC 
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AAACAAAGCT GTAGTCAGTT TATCAAATGG GGTCAGTGTT TTAACCAGCA AAGTGTTAGA 
TCTCAAGAAT TACATAAATA ACCAATTATT ACCCATAGTA AAT CAACAGA GCTGTCGCAT 
CTCCAACATT GAAACAGTTA TAGAATTCCA GCAGAAGAAC AGCAGATTGT TGGAAATCAC 
CAGAGAATTT AGTGTCAATG CAGGTGTAAC AACACCTTTA AGCACTTACA TGTTGACAAA 
CAGTGAGTTA CTATCATTAA TCAATGATAT GCCTATAACA AATGATCAGA AAAAATTAAT 
GTCAAGCAAT GTTCAGATAG TAAGGCAACA AAGTTATTCC ATCATGTCTA TAATAAAGGA 
AGAAGTCCTT GCATATGTTG TACAGCTGCC TATCTATGGT GTAATAGATA CACCTTGCTG 
GAAATTGCAC ACATCGCCTC TATGCACTAC CAACATCAAA GAAGGATCAA ATATTTGTTT 
AACAAGGACT GATAGAGGAT GGTATTGTGA TAATGCAGGA TCAGTATCCT TCTTTCCACA 
GGCTGACACT TGTAAAGTAC AGTCCAATCG AGTATTTTGT GACACTATGA ACAGTTTGAC 
ATTACCAAGT GAAGTCAGCC TTTGTAACAC TGACATATTC AATTCCAAGT ATGACTGCAA 
AATTATGACA TCAAAAACAG ACATAAGCAG CTCAGTAATT ACTTCTCTTG GAGCTATAGT 
GTCATGCTAT GGTAAAACTA AATGCACTGC ATCCAACAAA AATCGTGGGA TTATAAAGAC 
ATTTTCTAAT GGTTGTGACT ATGTGTCAAA CAAAGGAGTA GATACTGTGT CAGTGGGCAA 
CACTTTATAC TATGTAAACA AGCTGGAAGG CAAGAACCTT TATGTAAAAG GGGAACCTAT 
AATAAATTAC TATGACCCTC TAGTGTTTCC TTCTGATGAG TTTGATGCAT CAATATCTCA 
AGTCAATGAA AAAATCAATC AAAGTTTAGC TTTTATTCGT AGATCTGATG AATTACTACA 
TAATGTAAAT ACTGGCAAAT CTACTACAAA TATTATGATA ACTACAATTA TTATAGTAAT 
CATTGTAGTA TTGTTATCAT TAATAGCTAT TGGTTTACTG TTGTATTGTA AAGCCAAAAA 
CACACCAGTT ACACTAAGCA AAGACCAACT AAGTGGAATC AATAATATTG CATTCAGCAA 
ATAGACAAAA AACCACCTGA TCATGTTTCA ACAACAATCT GCTGACCACC AATCCCAAAT 
CAACTTACAA CAAATATTTC AACATCACAG TACAGGCTGA ATCATTTCCT CACATCATGC 
TACCCACATA ACTAAGCTAG ATCCTTAACT TATAGTTACA TAAAAACCTC AAGTATCACA 
ATCAACCACT AAATCAACAC ATCATTCACA AAATTAACAG CTGGGGCAAA TATGTCGCGA 
AGAAATCCTT GTAAATTTGA GATTAGAGGT CATTGCTTGA ATGGTAGAAG ATGTCACTAC 
AGTCATAATT ACTTTGAATG GCCTCCTCAT GCATTACTAG TGAGGCAAAA CTTCATGTTA 
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AACAAGATAC TCAAGTCAAT GGACAAAAGC ATAGACACTT TGTCTGAAAT AAGTGGAGCT 
GCTGAACTGG ATAGAACAGA AGAATATGCT CTTGGTATAG TTGGAGTGCT AGAGAGTTAC 
ATAGGATCTA TAAACAACAT AACAAAACAA TCAGCATGTG TTGCTATGAG TAAACTTCTT 
ATTGAGATCA ATAGTGATGA CATTAAAAAG CTTAGAGATA ATGAAGAACC CAATTCACCT 
AAGATAAGAG TGTACAATAC TGTTATATCA TACATTGAGA GCAATAGAAA AAACAACAAG 
CAAACCATCC ATCTGCTCAA GAGACTACCA GCAGACGTGC TGAAGAAGAC AATAAAGAAC 
ACATTAGATA TCCACAAAAG CATAACCATA AGCAATCCAA AAGAGTCAAC TGTGAATGAT 
CAAAATGACC AAACCAAAAA TAATGATATT ACCGGATAAA TATCCTTGTA GTATATCATC 
CATATTGATC TCAAGTGAAA GCATGGTTGC TACATTCAAT CATAAAAACA TATTACAATT 
TAACCATAAC TATTTGGATA ACCACCAGCG TTTATTAAAT CATATATTTG ATGAAATTCA 
TTGGACACCT AAAAACTTAT TAGATGCCAC TCAACAATTT CTCCAACATC TTAACATCCC 
TGAAGATATA TATACAGTAT ATATATTAGT GTCATAATGC TTGACCATAA CGACTCTATG 
TCATCCAACC ATAAAACTAT TTTGATAAGG TTATGGGACA AAATGGATCC CATTATTAAT 
GGAAACTCTG CTAATGTGTA TCTAACTGAT AGTTATTTAA AAGGTGTTAT CTCTTTTTCA 
GAGTGTAATG CTTTAGGGAG TTATCTTTTT AACGGCCCTT ATCTTAAAAA TGATTACACC 
AACTTAATTA GTAGACAAAG CCCACTACTA GAGCATATGA ATCTTAAAAA ACTAACTATA 
ACACAGTCAT TAATATCTAG ATATCATAAA GGTGAACTGA AATTAGAAGA ACCAACTTAT 
TTCCAGTCAT TACTTATGAC ATATAAAAGT ATGTCCTCGT CTGAACAAAT TGCTACAACT 
AACTTACTTA AAAAAATAAT ACGAAGAGCC ATAGAAATAA GTGATGTAAA GGTGTACGCC 
ATCTTGAATA AACTAGGATT AAAGGAAAAG GACAGAGTTA AGCCCAACAA TAATTCAGGT 
GATGAAAACT CAGTACTTAC AACCATAATT AAAGATGATA TACTTTCGGC TGTGGAAAAC 
AATCAATCAT ATACAAATTC AGACAAAAGT CACTCAGTAA ATCAAAATAT CACTATCAAA 
ACAACACTCT TGAAAAAATT GATGTGTTCA ATGCAACATC CTCCATCATG GTTAATACAC 
TGGTTCAATT TATATACAAA ATTAAATAAC ATATTAACAC AATATCGATC AAATGAGGTA 
AAAAGTCATG GGTTTATATT AATAGATAAT CAAACTTTAA GTGGTTTTCA GTTTATTTTA 
AATCAATATG GTTGTATCGT T T AT CA T AAA GGACTCAAAA AAATCACAAC TACTACTTAC 
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AATCAATTTT TGACATGGAA AGACATCAGC CTTAGCAGAT TAAATGTTTG CTTAATTACT 
TGGATAAGTA ATTGTTTAAA TACATTAAAC AAAAGCTTAG GGCTGAGATG TGGATTCAAT 
AATGTTGTGT TATCACAATT ATTTCTTTAT GGAGATTGTA TACTGAAATT ATTTCATAAT 
GAAGGCTTCT ACATAATAAA AGAAGTAGAG GGATTTATTA TGTCTTTAAT TCTAAACATA 
ACAGAAGAAG ATCAATTTAA GAAACGATTT TATAATAGCA TGCTAAATAA CATCACAGAT 
GCAGCTATTA AGGCTCAAAA GGACCTACTA TCAAGAGTAT GTCACACTTT ATTAGACAAG 
ACAGTGTCTG ATAATATCAT AAATGGTAAA TGGATAATCC TATTAAGTAA ATTTCTTAAA 
TTGATTAAGC TTGCAGGTGA TAATAATCTC AATAACTTGA GTGAGCTATA TTTTCTCTTC 
AGAATCTTTG GACATCCAAT GGTCGATGAA AGACAAGCAA TGGATTCTGT AAGAATTAAC 
TGTAATGAAA CTAAGTTCTA CTTATTAAGT AGTCTAAGTA CATTAAGAGG TGCTTTCATT 
TATAGAATCA TAAAAGGGTT TGTAAATACC TACAACAGAT GGCCCACCTT AAGGAATGCT 
ATTGTCCTAC CTCTAAGATG GTTAAACTAC TATAAACTTA ATACTTATCC ATCTCTACTT 
GAAATCACAG AAAATGATTT GATTATTTTA TCAGGATTGC GGTTCTATCG TGAGTTTCAT 
CTGCCTAAAA AAGTGGATCT TGAAATGATA ATAAATGACA AAGCCATTTC ACCTCCAAAA 
GATCTAATAT GGACTAGTTT TCCTAGAAAT TACATGCCAT CACATATACA AAATTATATA 
GAACATGAAA AGTTGAAGTT CTCTGAAAGC GACAGATCGA GAAGAGTACT AGAGTATTAC 
TTGAGAGATA ATAAATTCAA TGAATGCGAT CTATACAATT GTGTAGTCAA TCAAAGCTAT 
CTCAACAACT CTAATCACGT GGTATCACTA ACTGGTAAAG AAAGAGAGCT CAGTGTAGGT 
AGAATGTTTG CTATGCAACC AGGTATGTTT AGGCAAATCC AAATCTTAGC AGAGAAAATG 
ATAGCTGAAA ATATTTTACA ATTCTTCCCT GAGAGTTTGA CAAGATATGG TGATCTAGAG 
CTTCAAAAGA TATTAGAATT AAAAGCAGGA ATAAGCAACA AGTCAAATCG TTATAATGAT 
AACTACAACA ATTATATCAG TAAATGTTCT ATCATTACAG ATCTTAGCAA ATTCAATCAG 
GCATTTAGAT ATGAAACATC ATGTATCTGC AGTGATGTAT TAGATGAACT GCATGGAGTA 
CAATCTCTGT TCTCTTGGTT GCATTTAACA ATACCTCTTG TCACAATAAT ATGTACATAT 
AGACATGCAC CTCCTTTCAT AAAGGATCAT GTTGTTAATC TTAATGAGGT TGATGAACAA 
AGTGGATTAT ACAGATATCA TATGGGTGGT ATTGAGGGCT GGTGTCAAAA ACTGTGGACC 
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ATTGAAGCTA TATCATTATT AGATCTAATA TCTCTCAAAG GGAAATTCTC TATCACAGCT 10920 

CTGATAAATG GTGATAATCA GTCAATTGAT ATAAGCAAAC CAGTTAGACT TATAGAGGGT 10980 

CAGACCCATG CACAAGCAGA TTATTTGTTA GCATTAAATA GCCTTAAATT GTTATATAAA 11040 

GAGTATGCAG GTATAGGCCA TAAGCTTAAG GGAACAGAGA CCTATATATC CCGAGATATG 11100 

CAGTTCATGA GCAAAACAAT CCAGCACAAT GGAGTGTACT ATCCAGCCAG TATCAAAAAA 11160 

GTCCTGAGAG TAGGTCCATG GATAAACACG ATACTTGATG ATTTTAAAGT TAGTTTAGAA 11220 

TCTATAGGCA GCTTAACACA GGAGTTAGAA TACAGAGGAG AAAGCTTATT ATGCAGTTTA 11280 

ATATTTAGGA ACATTTGGTT ATACAATCAA ATTGCTTTGC AACTCCGAAA TCATGCATTA 11340 

TGTAACAATA AGCTATATTT AGATATATTG AAAGTATTAA AACACTTAAA AACTTTTTTT 11400 

AATCTTGATA GCATTGATAT GGCTTTATCA TTGTATATGA ATTTGCCTAT GCTGTTTGGT 11460 

GGTGGTGATC CTAATTTGTT ATATCGAAGC TTTTATAGGA GAACTCCAGA CTTCCTTACA 11520 

GAAGCTATAG TACATTCAGT GTTTGTGTTG AGCTATTATA CTGGTCACGA TTTACAAGAT 11580 

AAGCTCCAGG ATCTTCCAGA TGATAGACTG AACAAATTCT TGACATGTGT CATCACATTT 11640 

GATAAAAATC CCAATGCCGA GTTTGTAACA TTGATGAGGG ATCCACAGGC TTTAGGGTCT 11700 

GAAAGGCAAG CTAAAATTAC TAGTGAGATT AATAGATTAG CAGTAACAGA AGTCTTAAGT 11760 

ATAGCCCCAA ACAAAATATT TTCTAAAAGT GCACAACATT ATACTACCAC TGAGATTGAT 11820 

CTAAATGACA TTATGCAAAA TATAGAACCA ACTTACCCTC ATGGATTAAG AGTTGTTTAT 11880 

GAAAGTTTAC CTTTTTATAA AGCAGAAAAA ATAGTTAATC TTATATCAGG AACAAAATCC 11940 

ATAACTAATA TACTTGAAAA AACATCAGCA ATAGATACAA CTGATATTAA TAGGGCTACT 12000 

GATATGATGA GGAAAAATAT AACTTTACTT ATAAGGATAC TTCCACTAGA TTGTAACAAA 12060 

GACAAAAGAG AGTTATTAAG TTTAGAAAAT CTTAGTATAA CTGAATTAAG CAAGTATGTA 12120 

AGAGAAAGAT CTTGGTCATT ATCCAATATA GTAGGAGTAA CATCGCCAAG TATTATGTTC 12180 

ACAATGAACA TTAAATATAC AACTAGCACT ATAGCCAGTG GTATAATAAT AGAAAAATAT 12240 

AATGTTAATA GTTTAACTCG TGGTGAAAGA GGACCCACCA AGCCATGGGT AGGCTCATCC 12300 

ACGCAGGAGA AAAAAACAAT GCCAGTGTAC AACAGACAAG TTTTAACCAA AAAGCAAAGA 12360 

GACCAAATAG ATTTATTAGC AAAATTAGAC TGGGTATATG CAT CCATAG A CAACAAAGAT 12420 
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GAATTCATGG AAGAACTGAG TACTGGAACA CTTGGACTGT CATATGAAAA AGCCAAAAAG 
TTGTTTCCAC AATATCTAAG TGTCAATTAT TTACACCGTT TAACAGTCAG TAGTAGACCA 
TGTGAATTCC CTGCATCAAT ACCAGCTTAT AGAACAACAA ATTATCATTT TGATACTAGT 
CCTATCAATC ATGTATTAAC AGAAAAGTAT GGAGATGAAG ATATCGACAT TGTGTTTCAA 
AATTGCATAA GTTTTGGTCT TAGCCTGATG TCGGTTGTGG AACAATTCAC AAACATATGT 
CCTAATAGAA TTATTCTCAT ACCGAAGCTG AATGAGATAC ATTTGATGAA ACCTCCTATA 
TTTACAGGAG ATGTTGATAT CATCAAGTTG AAGCAAGTGA TACAAAAGCA GCACATGTTC 
CTACCAGATA AAATAAGTTT AACCCAATAT GTAGAATTAT TCTTAAGTAA CAAAGCACTT 
AAATCTGGAT CTCACATCAA CTCTAATTTA ATATTAGTAC ATAAAATGTC TGATTATTTT 
CATAATGCTT ATATTTTAAG TACTAATTTA GCTGGACATT GGATTCTGAT TATTCAACTT 
ATGAAAGATT CAAAAGGTAT TTTTGAAAAA GATTGGGGAG AGGGGTACAT AACTGATCAT 
ATGTTCATTA ATTTGAATGT TTTCTTTAAT GCTTATAAGA CTTATTTGCT ATGTTTTCAT 
AAAGGTTATG GTAAAGCAAA ATTAGAATGT GATATGAACA CTTCAGATCT TCTTTGTGTT 
TTGGAGTTAA TAGACAGTAG CTACTGGAAA TCTATGTCTA AAGTTTTCCT AGAACAAAAA 
GTCATAAAAT ACATAGTCAA TCAAGACACA AGTTTGCGTA GAATAAAAGG CTGTCACAGT 
TTTAAGTTGT GGTTTTTAAA ACGCCTTAAT AATGCTAAAT TTACCGTATG CCCTTGGGTT 
GTTAACATAG ATTATCACCC AACACACATG AAAGCTATAT TATCTTACAT AGATTTAGTT 
AGAATGGGGT TAATAAATGT AGATAAATTA ACCATTAAAA ATAAAAACAA ATTCAATGAT 
GAATTTTACA CATCAAATCT CTTTTACATT AGTTATAACT TTTCAGACAA CACTCATTTG 
CTAACAAAAC AAATAAGAAT TGCTAATTCA GAATTAGAAG ATAATTATAA CAAACTATAT 
CACCCAACCC CAGAAACTTT AGAAAATATG TCATTAATTC CTGTTAAAAG TAATAATAGT 
AACAAACCTA AATTTTGTAT AAGTGGAAAT ACCGAATCTA TGATGATGTC AACATTCTCT 
AGTAAAATGC ATATTAAATC TTCCACTGTT ACCACAAGAT TCAATTATAG CAAACAAGAC 
TTGTACAATT TATTTCCAAT TGTTGTGATA GACAAGATTA TAGATCATTC AGGTAATACA 
GCAAAATCTA ACCAACTTTA CACCACCACT TCACATCAGA CAT CTTT AGT AAGGAATAGT 
GCATCACTTT ATTGCATGCT TCCTTGGCAT CATGTCAATA GATTTAACTT TGTATTTAGT 



12480 
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TCCACAGGAT GCAAGATCAG TATAGAGTAT ATTTTAAAAG ATCTTAAGAT TAAGGACCCC 
AGTTGTATAG CATTCATAGG TGAAGGAGCT GGTAACTTAT TATTACGTAC GGTAGTAGAA 
CTTCATCCAG ACATAAGATA CATTTACAGA AGTTTAAAAG ATTGCAATGA TCATAGTTTA 
CCTATTGAAT TTCTAAGGTT ATACAACGGG CATATAAACA TAGATTATGG TGAGAATTTA 
ACCATTCCTG CTACAGATGC AACTAATAAC ATTCATTGGT CTTATTTACA TATAAAATTT 
GCAGAACCTA TTAGCATCTT TGTCTGCGAT GCTGAATTAC CTGTTACAGC CAATTGGAGT 
AAAATTATAA TTGAATGGAG TAAGCATGTA AGAAAGTGCA AGTACTGTTC TTCTGTAAAT 
AGATGCATTT TAATTGCAAA ATATCATGCT CAAGATGACA TTGATTTCAA ATTAGATAAC 
ATTACTATAT TAAAAACTTA CGTGTGCCTA GGTAGCAAGT TAAAAGGATC TGAAGTTTAC 
TTAATCCTTA CAATAGGCCC TGCAAATATA CTTCCTGTTT TTGATGTTGT ACAAAATGCT 
AAATTGATAC TTTCAAGAAC TAAAAATTTC ATTATGCCTA AAAAAACTGA CAAGGAATCT 
ATCGATGCAA ATATTAAAAG CTTAATACCT TTCCTTTGTT ACCCTATAAC AAAAAAAGGA 
ATTAAGACTT CATTGTCAAA ATTGAAGAGT GTAGTTAATG GAGATATATT ATCATATTCT 
ATAGCTGGAC GTAATGAAGT ATTCAGCAAC AAGCTTATAA ACCACAAGCA TATGAATATC 
CTAAAATGGC TAGATCATGT TTTAAATTTT AGATCAGCTG AACTTAATTA CAATCATTTA 
TACATGATAG AGTCCACATA TCCTTACTTA AGTGAATTGT TAAATAGTTT AACAACCAAT 
GAGCTCAAGA AGCTGATTAA AATAACAGGT AGTGTGCTAT ACAACCTTCC CAACGAACAG 
TAGTTTAAAA TATCATTAAC AAGTTTGGTC AAATTTAGAT GCTAACACAT CATTATATTA 
TAGTTATTAA AGAATATACA AACTTTTCAA TAATTTAGCA TATTGATTCC AAAATTATCA 
TTTTAGTCTT AAGGGGTTAA ATAAAAGTCT AAAACTAACA ATTATACATG TGCATTCACA 
ACACAACGAG ACATTAGTTT TTGACACTTT TTTTCTCGT 
(2) INFORMATION FOR SEQ ID NO: 32: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2166 amino acids 

(B) TYPE: amino acid 

(C) STRAND EDNESS : 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



14040 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 32: 

Met Asp Pro lie He Asn Gly Asn Ser Ala Asn Val Tyr Leu Thr Asp 
15 10 15 

Ser Tyr Leu Lys Gly Val He Ser Phe Ser Glu Cys Asn Ala Leu Gly 
20 25 30 

Ser Tyr Leu Phe Asn Gly Pro Tyr Leu Lys Asn Asp Tyr Thr Asn Leu 
35 40 45 

He Ser Arg Gin Ser Pro Leu Leu Glu His Met Asn Leu Lys Lys Leu 
50 55 60 

Thr He Thr Gin Ser Leu He Ser Arg Tyr His Lys Gly Glu Leu Lys 
65 70 75 80 

Leu Glu Glu Pro Thr Tyr Phe Gin Ser Leu Leu Met Thr Tyr Lys Ser 
85 90 95 

Met Ser Ser Ser Glu Gin He Ala Thr Thr Asn Leu Leu Lys Lys He 
100 105 110 

He Arg Arg Ala He Glu He Ser Asp Val Lys Val Tyr Ala He Leu 
115 120 125 

Asn Lys Leu Gly Leu Lys Glu Lys Asp Arg Val Lys Pro Asn Asn Asn 
130 135 140 

Ser Gly Asp Glu Asn Ser Val Leu Thr Thr He He Lys Asp Asp He 
145 150 155 160 

Leu Ser Ala Val Glu Asn Asn Gin Ser Tyr Thr Asn Ser Asp Lys Ser 
165 170 175 

His Ser Val Asn Gin Asn He Thr He Lys Thr Thr Leu Leu Lys Lys 
180 185 190 

Leu Met Cys Ser Met Gin His Pro Pro Ser Trp Leu He His Trp Phe 
195 200 205 

Asn Leu Tyr Thr Lys Leu Asn Asn He Leu Thr Gin Tyr Arg Ser Asn 
210 215 220 

Glu Val Lys Ser His Gly Phe He Leu He Asp Asn Gin Thr Leu Ser 
225 230 235 240 

Gly Phe Gin Phe He Leu Asn Gin Tyr Gly Cys He Val Tyr His Lys 
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245 250 255 

Gly Leu Lys Lys lie Thr Thr Thr Thr Tyr Asn Gin Phe Leu Thr Trp 
260 265 270 

Lys Asp lie Ser Leu Ser Arg Leu Asn Val Cys Leu lie Thr Trp lie 
275 280 285 

Ser Asn Cys Leu Asn Thr Leu Asn Lys Ser Leu Gly Leu Arg Cys Gly 
290 295 300 

Phe Asn Asn Val Val Leu Ser Gin Leu Phe Leu Tyr Gly Asp Cys lie 
305 310 315 320 

Leu Lys Leu Phe His Asn Glu Gly Phe Tyr lie lie Lys Glu Val Glu 
325 . ' 330 335 

Gly Phe lie Met Ser Leu lie Leu Asn lie Thr Glu Glu Asp Gin Phe 
340 345 350 

Lys Lys Arg Phe Tyr Asn Ser Met Leu Asn Asn lie Thr Asp Ala Ala 
355 360 365 

He Lys Ala Gin Lys Asp Leu Leu Ser Arg Val Cys His Thr Leu Leu 
370 375 380 

Asp Lys Thr Val Ser Asp Asn lie He Asn Gly Lys Trp He He Leu 
385 390 395 400 

Leu Ser Lys Phe Leu Lys Leu He Lys Leu Ala Gly Asp Asn Asn Leu 
405 410 415 

Asn Asn Leu Ser Glu Leu Tyr Phe Leu Phe Arg He Phe Gly His Pro 
420 425 430 

Met Val Asp Glu Arg Gin Ala Met Asp Ser Val Arg He Asn Cys Asn 
435 440 445 

Glu Thr Lys Phe Tyr Leu Leu Ser Ser Leu Ser Thr Leu Arg Gly Ala 
450 455 460 

Phe He Tyr Arg He He Lys Gly Phe Val Asn Thr Tyr Asn Arg Trp 
465 470 475 480 

Pro Thr Leu Arg Asn Ala He Val Leu Pro Leu Arg Trp Leu Asn Tyr 
485 490 *" 495 

Tyr Lys Leu Asn Thr Tyr Pro Ser Leu Leu Glu He Thr Glu Asn Asp 
500 505 510 

Leu He He Leu Ser Gly Leu Arg Phe Tyr Arg Glu Phe His Leu Pro 
515 520 525 
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Lys Lys Val Asp Leu Glu Met He He Aen Asp Lys Ala He Ser Pro 
530 535 540 

Pro Lys Asp Leu He Trp Thr Ser Phe Pro Arg Asn Tyr Met Pro Ser 
545 550 555 560 

His He Gin Asn Tyr He Glu His Glu Lys Leu Lys Phe Ser Glu Ser 
565 570 575 

Asp Arg Ser Arg Arg Val Leu Glu Tyr Tyr Leu Arg Asp Asn Lys Phe 
580 585 590 

Asn Glu Cys Asp Leu Tyr Asn Cys Val Val Asn Gin Ser Tyr Leu Asn 
595 600 605 

Asn Ser Asn His Val Val Ser Leu Thr Gly Lys Glu Arg Glu Leu Ser 
610 615 620 

Val Gly Arg Met Phe Ala Met Gin Pro Gly Met Phe Arg Gin He Gin 
625 630 635 640 

He Leu Ala Glu Lye Met He Ala Glu Asn He Leu Gin Phe Phe Pro 
645 650 655 

Glu Ser Leu Thr Arg Tyr Gly Asp Leu Glu Leu Gin Lys He Leu Glu 
660 665 670 

Leu Lys Ala Gly He Ser Asn Lys Ser Asn Arg Tyr Asn Asp Asn Tyr 
675 680 685 

Asn Asn Tyr He Ser Lys Cys Ser He He Thr Asp Leu Ser Lys Phe 
690 695 700 

Asn Gin Ala Phe Arg Tyr Glu Thr Ser Cys He Cys Ser Asp Val Leu 
705 710 715 720 

Asp Glu Leu His Gly Val Gin Ser Leu Phe Ser Trp Leu His Leu Thr 
725 730 735 

He Pro Leu Val Thr He He Cys Thr Tyr Arg His Ala Pro Pro Phe 
740 745 750 

He Lys Asp His Val Val Asn Leu Asn Glu Val Asp Glu Gin Ser Gly 
755 760 765 

Leu Tyr Arg Tyr His Met Gly Gly He Glu Gly Trp Cys Gin Lys Leu 
770 775 780 

Trp Thr He Glu Ala He Ser Leu Leu Asp Leu He Ser Leu Lys Gly 
785 790 795 800 
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Lys Phe Ser lie Thr Ala Leu lie Asn Gly Asp Asn Gin Ser He Asp 
805 810 815 

He Ser Lys Pro Val Arg Leu He Glu Gly Gin Thr His Ala Gin Ala 
820 825 830 

Asp Tyr Leu Leu Ala Leu Asn Ser Leu Lys Leu Leu Tyr Lye Glu Tyr 
835 840 845 

Ala Gly He Gly His Lys Leu Lys Gly Thr Glu Thr Tyr He Ser Arg 
850 855 860 

Asp Met Gin Phe Met Ser Lys Thr He Gin His Asn Gly Val Tyr Tyr 
865 870 875 880 

Pro Ala Ser He Lys Lys Val Leu Arg Val Gly Pro Trp He Asn Thr 
885 890 895 

He Leu Asp Asp Phe Lys Val Ser Leu Glu Ser He Gly Ser Leu Thr 
900 905 910 

Gin Glu Leu Glu Tyr Arg Gly Glu Ser Leu Leu Cys Ser Leu He Phe 
915 920 925 

Arg Asn He Trp Leu Tyr Asn Gin He Ala Leu Gin Leu Arg Asn His 
930 935 940 

Ala Leu Cys Asn Asn Lys Leu Tyr Leu Asp He Leu Lys Val Leu Lys 
945 950 955 960 

His Leu Lys Thr Phe Phe Asn Leu Asp Ser He Asp Met Ala Leu Ser 
965 970 975 

Leu Tyr Met Asn Leu Pro Met Leu Phe Gly Gly Gly Asp Pro Asn Leu 
980 985 990 

Leu Tyr Arg Ser Phe Tyr Arg Arg Thr Pro Asp Phe Leu Thr Glu Ala 
995 1000 1005 

He Val His Ser Val Phe Val Leu Ser Tyr Tyr Thr Gly His Asp Leu 
1010 1015 1020 

Gin Asp Lys Leu Gin Asp Leu Pro Asp Asp Arg Leu Asn Lys Phe Leu 
1025 1030 1035 1040 

Thr Cys Val He Thr Phe Asp Lys Asn Pro Asn Ala Glu Phe Val Thr 
1045 1050 1055 

Leu Met Arg Asp Pro Gin Ala Leu Gly Ser Glu Arg Gin Ala Lys He 
1060 1065 1070 

Thr Ser Glu He Asn Arg Leu Ala Val Thr Glu Val Leu Ser He Ala 
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1075 1080 1085 

Pro Asn Lys lie Phe Ser Lys Ser Ala Gin His Tyr Thr Thr Thr Glu 
1090 1095 1100 

lie Asp Leu Asn Asp lie Met Gin Asn He Glu Pro Thr Tyr Pro His 
1105 1110 1115 1120 

Gly Leu Arg Val Val Tyr Glu Ser Leu Pro Phe Tyr Lys Ala Glu Lys 
1125 1130 1135 

He Val Asn Leu He Ser Gly Thr Lys Ser He Thr Asn He Leu Glu 
1140 1145 1150 

Lys Thr Ser Ala He Asp Thr Thr Asp He Asn Arg Ala Thr Asp Met 
1155 1160 1165 

Met Arg Lys Asn He Thr Leu Leu He Arg He Leu Pro Leu Asp Cys 
1170 1175 1180 

Asn Lys Asp Lys Arg Glu Leu Leu Ser Leu Glu Asn Leu Ser He Thr 
1185 1190 1195 1200 

Glu Leu Ser Lys Tyr Val Arg Glu Arg Ser Trp Ser Leu Ser Asn He 
1205 1210 1215 

Val Gly Val Thr Ser Pro Ser He Met Phe Thr Met Asn He Lys Tyr 
1220 1225 1230 

Thr Thr Ser Thr He Ala Ser Gly He He He Glu Lys Tyr Asn Val 
1235 1240 1245 

Asn Ser Leu Thr Arg Gly Glu Arg Gly Pro Thr Lys Pro Trp Val Gly 
1250 1255 1260 

Ser Ser Thr Gin Glu Lys Lys Thr Met Pro Val Tyr Asn Arg Gin Val 
1265 1270 1275 1280 

Leu Thr Lys Lys Gin Arg Asp Gin He Asp Leu Leu Ala Lys Leu Asp 
1285 1290 1295 

Trp Val Tyr Ala Ser He Asp Asn Lys Asp Glu Phe Met Glu Glu Leu 
1300 1305 1310 

Ser Thr Gly Thr Leu Gly Leu Ser Tyr Glu Lys Ala Lys Lys Leu Phe 
1315 1320 1325 

Pro Gin Tyr Leu Ser Val Asn Tyr Leu His Arg Leu Thr Val Ser Ser 
1330 1335 1340 

Arg Pro Cys Glu Phe Pro Ala Ser He Pro Ala Tyr Arg Thr Thr Asn 
1345 1350 1355 1360 
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Tyr His Phe Asp Thr Ser Pro He Asn His Val Leu Thr Glu Lys Tyr 
1365 1370 1375 

Qly Asp Glu Asp He Asp He Val Phe Gin Asn Cys He Ser Phe Gly 
1380 1385 1390 

Leu Ser Leu Met Ser Val Val Glu Gin Phe Thr Asn He Cys Pro Asn 
1395 1400 1405 

Arg He He Leu He Pro Lys Leu Asn Glu He His Leu Met Lys Pro 
1410 1415 1420 

Pro He Phe Thr Gly Asp Val Asp He He Lys Leu Lys Gin Val He 
1425 1430 1435 1440 

Gin Lys Gin His Met Phe Leu Pro Asp Lys He Ser Leu Thr Gin Tyr 
1445 1450 1455 

Val Glu Leu Phe Leu Ser Asn Lys Ala Leu Lys Ser Gly Ser His He 
1460 1465 1470 

Asn Ser Asn Leu He Leu Val His Lys Met Ser Asp Tyr Phe His Asn 
1475 1480 1485 

Ala Tyr He Leu Ser Thr Asn Leu Ala Gly His Trp He Leu He lie 
1490 1495 1500 

Gin Leu Met Lys Asp Ser Lys Gly He Phe Glu Lys Asp Trp Gly Glu 
1505 1510 1515 1520 

Gly Tyr He Thr Asp His Met Phe He Asn Leu Asn Val Phe Phe Asn 
1525 1530 1535 

Ala Tyr Lys Thr Tyr Leu Leu Cys Phe His Lys Gly Tyr Gly Lys Ala 
1540 1545 1550 

Lys Leu Glu Cys Asp Met Asn Thr Ser Asp Leu Leu Cys Val Leu Glu 
1555 1560 1565 

Leu He Asp Ser Ser Tyr Trp Lys Ser Met Ser Lys Val Phe Leu Glu 
1570 1575 1580 

Gin Lys Val He Lys Tyr He Val Asn Gin Asp Thr Ser Leu Arg Arg 
1585 1590 1595 1600 

He Lys Gly Cys His Ser Phe Lys Leu Trp Phe Leu Lys Arg Leu Asn 
1605 1610 1615 

Asn Ala Lys Phe Thr Val Cys Pro Trp Val Val Asn He Asp Tyr His 
1620 1625 1630 
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Pro Thr His Met Lys Ala He Leu Ser Tyr He Asp Leu Val Arg Met 
1635 1640 1645 



Gly Leu He Asn Val Asp Lys Leu 
1650 1655 

Asn Asp Glu Phe Tyr Thr Ser Asn 
1665 1670 

Ser Asp Asn Thr His Leu Leu Thr 
1685 

Glu Leu Glu Asp Asn Tyr Asn Lys 
1700 



Thr He Lys Asn Lys Asn Lys Phe 
1660 

Leu Phe Tyr He Ser Tyr Asn Phe 
1675 1680 

Lys Gin He Arg He Ala Asn Ser 
1690 1695 

Leu Tyr His Pro Thr Pro Glu Thr 
1705 1710 



Leu Glu Asn Met Ser Leu He Pro Val Lys Ser Asn Asn Ser Asn Lys 
1715 1720 1725 

Pro Lys Phe Cys He Ser Gly Asn Thr Glu Ser Met Met Met Ser Thr 
1730 1735 1740 

Phe Ser Ser Lys Met His He Lys Ser Ser Thr Val Thr Thr Arg Phe 
1745 1750 1755 1760 

Asn Tyr Ser Lys Gin Asp Leu Tyr Asn Leu Phe Pro He Val Val He 
1765 1770 1775 

Asp Lys He He Asp His Ser Gly Asn Thr Ala Lys Ser Asn Gin Leu 
1780 1785 1790 

Tyr Thr Thr Thr Ser His Gin Thr Ser Leu Val Arg Asn Ser Ala Ser 
1795 1800 1805 

Leu Tyr Cys Met Leu Pro Trp His His Val Asn Arg Phe Asn Phe Val 
1810 1815 1820 

Phe Ser Ser Thr Gly Cys Lys He Ser He Glu Tyr He Leu Lys Asp 
1825 1830 1835 1840 

Leu Lys He Lys Asp Pro Ser Cys He Ala Phe He Gly Glu Gly Ala 
1845 1850 1855 

Gly Asn Leu Leu Leu Arg Thr Val Val Glu Leu His Pro Asp He Arg 
1860 1865 1870 

Tyr He Tyr Arg Ser Leu Lys Asp Cys Asn Asp His Ser Leu Pro He 
1875 1880 1885 

Glu Phe Leu Arg Leu Tyr Asn Gly His He Asn He Asp Tyr Gly Glu 
1890 1895 1900 

Asn Leu Thr He Pro Ala Thr Asp Ala Thr Asn Asn He His Trp Ser 
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1905 1910 1915 1920 

Tyr Leu His lie Lys Phe Ala Glu Pro He Ser He Phe Val Cys Asp 
1925 1930 1935 

Ala Glu Leu Pro Val Thr Ala Asn Trp Ser Lys He He He Glu Trp 
1940 1945 1950 

Ser Lys His Val Arg Lys Cys Lys Tyr Cys Ser Ser Val Asn Arg Cys 
1955 1960 1965 

He Leu He Ala Lys Tyr His Ala Gin Asp Asp He Asp Phe Lys Leu 
1970 1975 1980 

Asp Asn He Thr He Leu Lys Thr Tyr Val Cys Leu Gly Ser Lys Leu 
1985 1990 1995 2000 

Lys Gly Ser Glu Val Tyr Leu He Leu Thr He Gly Pro Ala Asn He 
2005 2010 2015 

Leu Pro Val Phe Asp Val Val Gin Asn Ala Lys Leu He Leu Ser Arg 
2020 2025 2030 

Thr Lys Asn Phe He Met Pro Lys Lys Thr Asp Lys Glu Ser He Asp 
2035 2040 2045 

Ala Asn He Lys Ser Leu He Pro Phe Leu Cys Tyr Pro He Thr Lys 
2050 2055 2060 

Lys Gly He Lys Thr Ser Leu Ser Lys Leu Lys Ser Val Val Asn Gly 
2065 2070 2075 2080 

Asp He Leu Ser Tyr Ser He Ala Gly Arg Asn Glu Val Phe Ser Asn 
2085 2090 2095 

Lys Leu He Asn His Lys His Met Asn He Leu Lys Trp Leu Asp His 
2100 2105 2110 

Val Leu Asn Phe Arg Ser Ala Glu Leu Asn Tyr Asn His Leu Tyr Met 
2115 2120 2125 

He Glu Ser Thr Tyr Pro Tyr Leu Ser Glu Leu Leu Asn Ser Leu Thr 
2130 2135 2140 

Thr Asn Glu Leu Lys Lys Leu He Lys He Thr Gly Ser Val Leu Tyr 
2145 2150 2155 2160 

Asn Leu Pro Asn Glu Gin 
2165 

(2) INFORMATION FOR SEQ ID NO: 33: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15219 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 33: 
ACGGGAAAAA AATGCGTACT ACAAACTTGC ACATTCGAAA AAAATGGGGC AAATAAGAAC 
TTGATAAGTG CTATTTAAGT CTAACCTTTT CAATCAGAAA TGGGGTGCAA TTCACTGAGC 
ATGATAAAGG TTAGATTACA AAATTTATTT GACAATGACG AAGTAGCATT GTTAAAAATA 
ACATGTTATA CTGATAAATT AATTCTTCTG ACCAATGCAT TAGCCAAAGC AGCAATACAT 
ACAATTAAAT TAAACGGCAT AGTTTTTATA CATGTTATAA CAAGCAGTGA AGTGTGCCCT 
GATAACAATA TTGTAGTGAA ATCTAACTTT ACAACAATGC CAATACTACA AAATGGAGGA 
TACATATGGG AATTGATTGA GTTGACACAC TGCTCTCAAT TAAACGGTTT AATGGATGAT 
AATTGTGAAA TCAAATTTTC TAAAAGACTA AGTGACTCAG TAATGACTAA TTATATGAAT 
CAAATATCTG ACTTACTTGG GCTTGATCTC AATTCATGAA TTATGTTTAG TCTAATTCAA 
TAGACATGTG TTTATTACCA TTTTAGTTAA TATAAAAACT CATCAAAGGG AAATGGGGCA 
AATAAACTCA CCTAATCAAT CAAACCATGA GCACTACAAA TGACAACACT ACTATGCAAA 
GATTGATGAT CACAGACATG AGACCCCTGT CAATGGATTC AATAATAACA TCTCTTACCA 
AAGAAATCAT CACACACAAA TTCATATACT TGATAAACAA TGAATGTATT GTAAGAAAAC 
TTGATGAAAG ACAAGCTACA TTTACATTCT TAGTCAATTA TGAGATGAAG CTACTGCACA 
AAGTAGGGAG TACCAAATAC AAAAAATACA CTGAATATAA TACAAAATAT GGCACTTTCC 
CCATGCCTAT ATTTATCAAT CACGGCGGGT TTCTAGAATG TATTGGCATT AAGCCTACAA 
AACACACTCC TATAATATAC AAATATGACC TCAACCCGTG AATTCCAACA AAAAAACCAA 
CCCAACCAAA CCAAACTATT CCTCAAACAA CAGTGCTCAA TAGTTAAGAA GGAGCTAATC 
CATTTTAGTA ATTAAAAATA AAAGTAAAGC CAATAACATA AATTGGGGCA AATACAAAGA 
TGG CTCTT AG CAAAGTCAAG TTGAATGATA CATTAAATAA GGATCAGCTG CTGTCATCCA 
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GCAAATACAC TATTCAACGT AGTACAGGAG ATAATATTGA CACTCCCAAT TATGATGTGC 
AAAAACACCT AAACAAACTA TGTGGTATGC TATTAATCAC TGAAGATGCA AATCATAAAT 
TCACAGGATT AATAGGTATG TTATATGCTA TGTCCAGGTT AGGAAGGGAA GACACTATAA 
AGATACTTAA AGATGCTGGA TATCATGTTA AAGCTAATGG AGTAGATATA ACAACATATC 
GTCAAGATAT AAATGGAAAG GAAATGAAAT TCGAAGTATT AACATTATCA AGCTTGACAT 
CAGAAATACA AGTCAATATT GAGATAGAAT CTAGAAAGTC CTACAAAAAA ATGCTAAAAG 
AGATGGGAGA AGTGGCTCCA GAATATAGGC ATGATTCTCC AGACTGTGGG ATGATAATAC 
TGTGTATAGC TGCACTTGTG ATAACCAAAT TAGCAGCAGG AGACAGATCA GGTCTTACAG 
CAGTAATTAG GAGGGCAAAC AATGTCTTAA AAAACGAAAT AAAACGATAC AAGGGCCTCA 
TACCAAAGGA TATAGCTAAC AGTTTTTATG AAGTGTTTGA AAAACACCCT CATCTTATAG 
ATGTTTTCGT GCACTTTGGC ATTGCACAAT CATCCACAAG AGGGGGTAGT AGAGTTGAAG 
GAATCTTTGC AGGATTGTTT ATGAATGCCT ATGGTTCAGG GCAAGTAATG CTAAGATGGG 
GAGTTTTAGC CAAATCTGTA AAAAATATCA TGCTAGGACA TGCTAGTGTC CAGGCAGAAA 
TGGAGCAAGT TGTGGAAGTC TATGAGTATG CACAGAAGTT GGGAGGAGAA GCTGGATTCT 
ACCATATATT GAACAATCCA AAAGCATCAT TGCTGTCATT AACTCAATTT CCCAACTTCT 
CAAGTGTGGT CCTAGGCAAT GCAGCAGGTC TAGGCATAAT GGGAGAGTAT AGAGGTACAC 
CAAGAAACCA GGATCTTTAT GATGCAGCTA AAGCATATGC AGAGCAACTC AAAGAAAATG 
GAGTAATAAA CTACAGTGTA TTAGACTTAA CAGCAGAAGA ATTGGAAGCC ATAAAGCATC 
AACTCAACCC CAAAGAAGAT GATGTAGAGC TTTAAGTTAA CAAAAAATAC GGGGCAAATA 
AGTCAACATG GAGAAGTTTG CACCTGAATT TCATGGAGAA GATGCAAATA ACAAAGCTAC 
CAAATTCCTA GAATCAATAA AGGGCAAGTT CGCATCATCC AAAGATCCTA AGAAGAAAGA 
TAGCATAATA TCTGTTAACT CAATAGATAT AGAAGTAACT AAAGAGAGCC CGATAACATC 
TGGCACCAAC ATCATCAATC CAACAAGTGA AGCCGACAGT ACCCCAGAAA CAAAAGCCAA 
CTACCCAAGA AAACCCCTAG TAAGCTTCAA AGAAGATCTC ACCCCAAGTG ACAACCCTTT 
TTCTAAGTTG TACAAGGAAA CAATAGAAAC ATTTGATAAC AATGAAGAAG AATCTAGCTA 
CTCATATGAA GAGATAAATG ATCAAACAAA TGACAACATT ACAGCAAGAC TAGATAGAAT 
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TGATGAAAAA TTAAGTGAAA TATTAGGAAT GCTCCATACA TTAGTAGTTG CAAGTGCAGG 
ACCCACTTCA GCTCGCGATG GAATAAGAGA TGCTATGGTT GGTCTAAGAG AAGAGATGAT 
AGAAAAAATA AGAGCGGAAG CATTAATGAC CAATGATAGG TTAGAGGCTA TGGCAAGACT 
TAGGAATGAG GAAAGCGAAA AAATGGCAAA AGACACCTCA GATGAAGTGT CTCTTAATCC 
AACTTCCAAA AAATTGAGTG ACTTGTTGGA AGACAACGAT AGTGACAATG ATCTATCACT 
TGATGATTTT TGATCAGCGA TCAACTCACT CAGCAATCAA CAACATCAAT AAAACAGACA 
TCAATCCATT GAATCAACTG CCAGACCGAA CAAACAAACG TCCATCAGTA GAACCACCAA 
CCAATCAATC AACCAATTGA TCAATCAGCA ACCCGACAAA ATTAACAATA TAGTAACAAA 
AAAAGAACAA GATGGGGCAA ATATGGAAAC ATACGTGAAC AAGCTTCACG AAGGCTCCAC 
ATACACAGCA GCTGTTCAGT ACAATGTTCT AGAAAAAGAT GATGATCCTG CATCACTAAC 
AATATGGGTG CCTATGTTCC AGTCATCTGT GCCAGCAGAC TTGCTCATAA AAGAACTTGC 
AAGCATCAAT ATACTAGTGA AGCAGATCTC TACGCCCAAA GGACCTTCAC TACGAGTCAC 
GATTAACTCA AGAAGTGCTG TGCTGGCTCA AATGCCTAGT AATTTCATCA TAAGCGCAAA 
TGTATCATTA GATGAAAGAA GCAAATTAGC ATATGATGTA ACTACACCTT GTGAAATCAA 
AGCATGCAGT CTAACATGCT TAAAAGTAAA AAGTATGTTA ACTACAGTCA AAGATCTTAC 
CATGAAGACA TTCAACCCCA CTCATGAGAT CATTGCTCTA TGTGAATTTG AAAATATTAT 
GACATCAAAA AGAGTAATAA TACCAACCTA TCTAAGATCA ATTAGTGTCA AGAACAAGGA 
TCTGAACTCA CTAGAAAATA TAGCAACCAC CGAATTCAAA AATGCTATCA CCAATGCAAA 
AATTATTCCT TATGCAGGAT TAGTGTTAGT TATCACAGTT ACTGACAATA AAGGAGCATT 
CAAATATATC AAACCACAGA GTCAATTTAT AGTAGATCTT GGTGCCTACC TAGAAAAAGA 
GAGCATATAT TATGTGACTA CTAATTGGAA GCATACAGCT ACACGTTTTT CAATCAAACC 
ACTAGAGGAT TAAACTTAAT TATCAACACT GAATGACAGG TCCACATATA TCCTCAAACT 
ACACACTATA TCCAAACATC ATAAACATCT ACACTACACA CTTCATCACA CAAACCAATC 
CCACTCAAAA TCCAAAATCA CTACCAGCCA CTATCTGCTA GACCTAGAGT GCGAATAGGT 
AAATAAAACC AAAATATGGG GTAAATAGAC ATTAGTTAGA GTTCAATCAA TCTTAACAAC 
CATTTATACC GCCAATTCAA CACATATACT ATAAATCTTA AAATGGGAAA TACATCCATC 
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ACAATAOAAT TCACAAGCAA ATTTT6GCCC TATTTTACAC TAATACATAT GATCTTAACT 
CTAATCTTTT TACTAATTAT AATCACTATT ATGATTGCAA TACTAAATAA GCTAAGTGAA 
CATAAAGCAT TCTGTAACAA AACTCTTGAA CTAGGACAGA TGTATCAAAT CAACACATAG 
AGTTCTACCA TTATGCTGTG TCAAATTATA ATCCTGTATA TATAAACAAA CAAATCCAAT 
CTTCTCACAG AGTCATGGTG TCGCAAAACC ACGCTAACTA TCATGGTAGC ATAGAGTAGT 
TATTTAAAAA TTAACATAAT GATGAATTGT TAGTATGAGA TCAAAAACAA CATTGGGGCA 
AATGCAACCA TGTCCAAACA CAAGAATCAA CGCACTGCCA GGACTCTAGA AAAGACCTGG 
GATACTCTTA ATCATCTAAT TGTAATATCC TCTTGTTTAT ACAGATTAAA TTTAAAATCT 
ATAGCACAAA TAGCACTATC AGTTTTGGCA ATGATAATCT CAACCTCTCT CATAATTGCA 
GCCATAATAT TCATCATCTC TGCCAATCAC AAAGTTACAC TAACAACGGT CACAGTTCAA 
ACAATAAAAA ACCACACTGA AAAAAACATC ACCACCTACC CTACTCAAGT CTCACCAGAA 
AGGGTTAGTT CATCCAAGCA ACCCACAACC ACATCACCAA TCCACACAAG TTCAGCTACA 
ACATCACCCA ATACAAAATC AGAAACACAC CATACAACAG CACAAACCAA AGGCAGAACC 
ACCACTTCAA CACAG AC CAA CAAGCCAAGC ACAAAACCAC GTCCAAAAAA TCCACCAAAA 
AAAGATGATT ACCATTTTGA AGTGTTCAAC TTCGTTCCCT GCAGTATATG TGGCAACAAT 
CAACTTTGCA AATCCATCTG CAAAACAATA CCAAGCAACA AACCAAAGAA GAAACCAACC 
ATCAAACCCA CAAACAAACC AACCACCAAA ACCACAAACA AAAGAGACCC AAAAACACCA 
GCCAAAACGA CGAAAAAAGA AACTACCACC AACCCAACAA AAAAACTAAC CCTCAAGACC 
ACAGAAAGAG ACACCAGCAC CTCACAATCC ACTGCACTCG ACACAACCAC ATTAAAACAC 
ACAGTCCAAC AGCAATCCCT CCTCTCAACC ACCCCCGAAA ACACACCCAA CTCCACACAA 
ACACCCACAG CATCCGAGCC CTCCACACCA AACTCCACCC AAAAAACCCA GCCACATGCT 
TAGTTATTCA AAAACTACAT CTTAGCAGAG AACCGTGATC TATCAAGCAA GAACGAAATT 
AAACCTGGGG CAAATAACCA TGGAGTTGAT GATCCACAAG TCAAGTGCAA TCTTCCTAAC 
TCTTGCTATT AATGCATTGT ACCTCACCTC AAGTCAGAAC ATAACTGAGG AGTTTTACCA 
ATCGACATGT AGTGCAGTTA GCAGAGGTTA TTTTAGTGCT TTAAGAACAG GTTGGTATAC 
TAGTGTCATA ACAATAGAAT TAAGTAATAT AAAAGAAACC AAATGCAATG GAACTGACAC 
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TAAAGTAAAA CTTATGAAAC AAGAATTAGA TAAGTATAAG AATGCAGTAA CAGAATTACA 
GCTACTTATG CAAAACACAC CAGCTGTCAA CAACCGGGCC AGAAGAGAAG CACCACAGTA 
TATGAACTAC ACAATCAATA CCACTAAAAA CCTAAATGTA TCAATAAGCA AGAAGAGGAA 
ACGAAGATTT CTAGGCTTCT TGTTAGGTGT GGGATCTGCA ATAGCAAGTG GTATAGCTGT 
ATCAAAAGTT CTACACCTTG AAGGAGAAGT GAACAAGATC AAAAATGCTT TGTTGTCTAC 
AAACAAAGCT GTAGTCAGTT TATCAAATGG GGTCAGTGTT TTAACCAGCA AAGTGTTAGA 
TCTCAAGAAT TACATAAATA ACCAATTATT ACCCATAGTA AATCAACAGA GCTGTCGCAT 
CTCCAACATT GAAACAGTTA TAGAATTCCA GCAGAAGAAC AGCAGATTGT TGGAAATCAC 
CAGAGAATTT AGTGTCAATG CAGGTGTAAC AACACCTTTA AGCACTTACA TGTTGACAAA 
CAGTGAGTTA CTATCATTAA TCAATGATAT GCCTATAACA AATGATCAGA AAAAATTAAT 
GTCAAGCAAT GTTCAGATAG T AAGG CAACA AAGTTATTCC ATCATGTCTA TAATAAAGGA 
AGAAGTCCTT GCATATGTTG TACAGCTGCC TATCTATGGT GTAATAGATA CACCTTGCTG 
GAAATTGCAC ACATCGCCTC TATGCACTAC CAACA TC AAA GAAGGATCAA ATATTTGTTT 
AACAAGGACT GATAGAGGAT GGTATTGTGA TAATGCAGGA TCAGTATCCT TCTTTCCACA 
GGCTGACACT TGTAAAGTAC AGTCCAATCG AGTATTTTGT GACACTATGA ACAGTTTGAC 
ATTACCAAGT GAAGTCAGCC TTTGTAACAC TGACATATTC AATTCCAAGT ATGACTGCAA 
AATTATGACA TCAAAAACAG ACATAAGCAG CTCAGTAATT ACTTCTCTTG GAGCTATAGT 
GTCATGCTAT GGTAAAACTA AATGCACTGC ATCCAACAAA AATCGTGGGA TTATAAAGAC 
ATTTTCTAAT GGTTGTGACT ATGTGTCAAA CAAAGGAGTA GATACTGTGT CAGTGGGCAA 
CACTTTATAC TATGTAAACA AGCTGGAAGG CAAGAACCTT TATGTAAAAG GGGAACCTAT 
AATAAATTAC TATGACCCTC TAGTGTTTCC TTCTGATGAG TTTGATGCAT CAATATCTCA 
AGTCAATGAA AAAATCAATC AAAGTTTAGC TTTTATTCGT AGATC T GATG AATTACTACA 
TAATGTAAAT ACTGGCAAAT CTACTACAAA TATTATGATA ACTACAATTA TTATAGTAAT 
CATTGTAGTA TTGTTATCAT TAATAGCTAT TGGTTTACTG TTGTATTGTA AAGCCAAAAA 
CACACCAGTT ACACTAAGCA AAGACCAACT AAGTGGAATC AATAATATTG CATTCAGCAA 
ATAGACAAAA AACCACCTGA TCATGTTTCA ACAACAATCT GCTGACCACC AATCCCAAAT 
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CAACTTACAA CAAATATTTC AACATCACAG TACAGGCTGA ATCATTTCCT CACATCATGC 
TACCCACATA ACTAAGCTAG ATCCTTAACT TATAGTTACA TAAAAACCTC AAGTATCACA 
ATCAACCACT AAATCAACAC ATCATTCACA AAATTAACAG CTGGGGCAAA TATGTCGCGA 
AGAAATCCTT GTAAATTTGA GATTAGAGGT CATTGCTTGA ATGGTAGAAG ATGTCACTAC 
AGTCATAATT ACTTTGAATG GCCTCCTCAT GCATTACTAG TGAGGCAAAA CTTCATGTTA 
AACAAGATAC TCAAGTCAAT GGACAAAAGC ATAGACACTT TGTCTGAAAT AAGTGGAGCT 
GCTGAACTGG ATAGAACAGA AGAATATGCT CTTGGTATAG TTGGAGTGCT AGAGAGTTAC 
ATAGGATCTA TAAACAACAT AACAAAACAA TCAGCATGTG TTGCTATGAG TAAACTTCTT 
ATTGAGATCA ATAGTGATGA CATTAAAAAG CTTAGAGATA ATGAAGAACC CAATTCACCT 
AAGATAAGAG TGTACAATAC TGTTATATCA TACATTGAGA GCAATAGAAA AAACAACAAG 
CAAACCATCC ATCTGCTCAA GAGACTACCA GCAGACGTGC TGAAGAAGAC AATAAAGAAC 
ACATTAGATA TCCACAAAAG CATAACCATA AGCAATCCAA AAGAGTCAAC TGTGAATGAT 
CAAAATGACC AAACCAAAAA TAATGATATT ACCGGATAAA TATCCTTGTA GTATATCATC 
CATATTGATC TCAAGTGAAA GCATGGTTGC TACATTCAAT CATAAAAACA TATTACAATT 
TAACCATAAC TATTTGGATA ACCACCAGCG TTTATTAAAT CATATATTTG ATGAAATTCA 
TTGGACACCT AAAAACTTAT TAGATGCCAC TCAACAATTT CTCCAACATC TTAACATCCC 
TGAAGATATA TATACAGTAT ATATATTAGT GTCATAATGC TTGACCATAA CGACTCTATG 
TCATCCAACC ATAAAACTAT TTTGATAAGG TTATGGGACA AAATGGATCC CATTATTAAT 
GGAAACTCTG CTAATGTGTA TCTAACTGAT AGTTATTTAA AAGGTGTTAT CTCTTTTTCA 
GAGTGTAATG CTTTAGGGAG TTATCTTTTT AACGGCCCTT ATCTTAAAAA TGATTACACC 
AACTTAATTA GTAGACAAAG CCCACTACTA GAGCATATGA ATCTTAAAAA ACTAACTATA 
ACACAGTCAT TAATATCTAG ATATCATAAA GGTGAACTGA AATTAGAAGA ACCAACTTAT 
TTCCAGTCAT TACTTATGAC ATATAAAAGT ATGTCCTCGT CTGAACAAAT TGCTACAACT 
AACTTACTTA AAAAAATAAT ACGAAGAGCC ATAGAAATAA GTGATGTAAA GGTGTACGCC 
ATCTTGAATA AACTAGGATT AAAGGAAAAG GACAGAGTTA AGCCCAACAA TAATTCAGGT 
GATGAAAACT CAGTACTTAC AACTATAATT AAAGATGATA TACTTTCGGC TGTGGAAAAC 
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AATCAATCAT ATACAAATTC AGACAAAAGT CACTCAGTAA ATCAAAATAT CACTATCAAA 
ACAACACTCT TGAAAAAATT GATGTGTTCA ATGCAACATC CTCCATCATG GTTAATACAC 
TGGTTCAATT TATATACAAA ATTAAATAAC ATATTAACAC AATATCGATC AAATGAGGTA 
AAAAGTCATG GGTTTATATT AATAGATAAT CAAACTTTAA GTGGTTTTCA GTTTATTTTA 
AATCAATATG GTTGTATCGT TTAT CAT AAA GGACTCAAAA AAATCACAAC TACTACTTAC 
AATCAATTTT TGACATGGAA AGACATCAGC CTTAGCAGAT TAAATGTTTG CTTAATTACT 
TGGATAAGTA ATTGTTTAAA TACATTAAAC AAAAGCTTAG GGCTGAGATG TGGATTCAAT 
AATGTTGTGT TATCACAATT ATTTCTTTAT GGAGATTGTA TACTGAAATT ATTTCATAAT 
GAAGGCTTCT ACATAATAAA AGAAGTAGAG GGATTTATTA TGTCTTTAAT TCTAAACATA 
ACAGAAGAAG ATCAATTTAG GAAACGATTT TATAATAGCA TGCTAAATAA CATCACAGAT 
GCAGCTATTA AGGCTCAAAA GGACCTACTA TCAAGAGTAT GTCACACTTT ATTAGACAAG 
ACAGTGTCTG ATAATATCAT AAATGGTAAA TGGATAATCC TATTAAGTAA ATTTCTTAAA 
TTGATTAAGC TTGCAGGTGA TAATAATCTC AATAACTTGA GTGAGCTATA TTTTCTCTTC 
AGAATCTTTG GACATCCAAT GGTCGATGAA AGACAAGCAA TGGATTCTGT AAGAATTAAC 
TGTAATGAAA CTAAGTTCTA CTTATTAAGT AGTCTAAGTA CATTAAGAGG TGCTTTCATT 
TATAGAATCA TAAAAGGGTT TGTAAATACC TACAACAGAT GGCCCACCTT AAGGAATGCT 
ATTGTCCTAC CTCTAAGATG GTTAAACTAC TATAAACTTA ATACTTATCC ATCTCTACTT 
GAAATCACAG AAAATGATTT GATTATTTTA TCAGGATTGC GGTTCTATCG TGAGTTTCAT 
CTGCCTAAAA AAGTGGATCT TGAAATGATA ATAAATGACA AAGCCATTTC ACCTCCAAAA 
GATCTAATAT GGACTAGTTT TCCTAGAAAT TACATGCCAT CACATATACA AAATTATATA 
GAACATGAAA AGTTGAAGTT CTCTGAAAGC GACAGATCGA GAAGAGTACT AGAGTATTAC 
TTGAGAGATA ATAAATTCAA TGAATGC GAT CTATACAATT GTGTAGTCAA TCAAAGCTAT 
CTCAACAACT CTAATCACGT GGTATCACTA ACTGGTAAAG AAAGAGAGCT CAGTGTAGGT 
AGAATGTTTG CTATGCAACC AGGTATGTTT AGGCAAATCC AAATCTTAGC AGAGAAAATG 
ATAGCTGAAA ATATTTTACA ATTCTTCCCT GAGAGTTTGA CAAGATATGG TGATCTAGAG 
CTTCAAAAGA TATTAGAATT AAAAGCAGGA ATAAGCAACA AGTCAAATCG TTATAATGAT 
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AACTACAACA ATTATATCA6 TAAATGTTCT ATCATTACAG ATCTTAGCAA ATTCAATCAG 
GCATTTAGAT ATGAAACATC ATGTATCTGC AGTGATGTAT TAGATGAACT GCATGGAGTA 
CAATCTCTGT TCTCTTGGTT GCATTTAACA ATACCTCTTG TCACAATAAT ATGTACATAT 
AGACATGCAC CTCCTTTCAT AAAGGATCAT GTTGTTAATC TTAATGAGGT TGATGAACAA 
AGTGGATTAT ACAGATATCA TATGGGTGGT ATTGAGGGCT GGTGTCAAAA ACTGTGGACC 
ATTGAAGCTA TATCATTATT AGATCTAATA TCTCTCAAAG GGAAATTCTC TATCACAGCT 
CTGATAAATG GTGATAATCA GTCAATTGAT ATAAGCAAAC CAGTTAGACT TATAGAGGGT 
CAGACCCATG CACAAGCAGA TTATTTGTTA GCATTAAATA GCCTTAAATT GTTATATAAA 
GAGTATGCAG GTATAGGCCA TAAGCTTAAG GGAACAGAGA CCTATATATC CCGAGATATG 
CAGTTCATGA GCAAAACAAT CCAGCACAAT GGAGTGTACT ATCCAGCCAG TATCAAAAAA 
GTCCTGAGAG TAGGTCCATG GATAAACACG ATACTTGATG ATTTTAAAGT TAGTTTAGAA 
TCTATAGGCA GCTTAACACA GGAGTTAGAA TACAGAGGAG AAAGCTTATT ATGCAGTTTA 
ATATTTAGGA ACATTTGGTT ATACAATCAA ATTGCTTTGC AACTCCGAAA TCATGCATTA 
TGTAACAATA AGCTATATTT AGATATATTG AAAGTATTAA AACACTTAAA AACTTTTTTT 
AATCTTGATA GCATTGATAT GGCTTTATCA TTGTATATGA ATTTGCCTAT GCTGTTTGGT 
GGTGGTGATC CTAATTTGTT ATATCGAAGC TTTTATAGGA GAACTCCAGA CTTCCTTACA 
GAAGCTATAG TACATTCAGT GTTTGTGTTG AGCTATTATA CTGGTCACGA TTTACAAGAT 
AAGCTCCAGG ATCTTCCAGA TGATAGACTG AACAAATTCT TGACATGTGT CATCACATTT 
GATAAAAATC CCAATGCCGA GTTTGTAACA TTGATGAGGG ATCCACAGGC TTTAGGGTCT 
GAAAGGCAAG CTAAAATTAC TAGTGAGATT AATAGATTAG CAGTAACAGA AGTCTTAAGT 
ATAGCCCCAA ACAAAATATT TTCTAAAAGT GCACAACATT ATACTACCAC TGAGATTGAT 
CTAAATGACA TTATGCAAAA TATAGAACCA ACTTACCCTC ATGGATTAAG AGTTGTTTAT 
GAAAGTTTAC CTTTTTATAA AGCAGAAAAA ATAGTTAATC TTATATCAGG AACAAAATCC 
ATAACTAATA TACTTGAAAA AACATCAGCA ATAGATACAA CTGATATTAA TAGGGCTACT 
GATATGATGA GGAAAAATAT AACTTTACTT ATAAGGATAC TTCCACTAGA TTGTAACAAA 
GACAAAAGAG AGTTATTAAG TTTAGAAAAT CTTAGTATAA CTGAATTAAG CAAGTATGTA 
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AGAGAAAGAT CTTGGTCATT ATCCAATATA GTAGGAGTAA CATCGCCAAG TATTATGTTC 
ACAATGGACA TTAAATATAC AACTAGCACT ATAGCCAGTG GTATAATAAT AGAAAAATAT 
AATGTTAATA GTTTAACTCG TGGTGAAAGA GGACCCACCA AGCCATGGGT AGGCTCATCC 
ACGCAGGAGA AAAAAACAAT GCCAGTGTAC AACAGACAAG TTTTAACCAA AAAGCAAAGA 
GACCAAATAG ATTTATTAGC AAAATTAGAC TGGGTATATG CATCCATAGA CAACAAAGAT 
GAATTCATGG AAGAACTGAG TACTGGAACA CTTGGACTGT CATATGAAAA AGCCAAAAAG 
TTGTTTCCAC AATATCTAAG TGTCAATTAT TTACACCGTT TAACAGTCAG TAGTAGACCA 
TGTGAATTCC CTGCATCAAT ACCAGCTTAT AGAACAACAA ATTATCATTT TGATACTAGT 
CCTATCAATC ATGTATTAAC AGAAAAGTAT GGAGATGAAG ATATCGACAT TGTGTTTCAA 
AATTGCATAA GTTTTGGTCT TAGCCTGATG TCGGTTGTGG AACAATTCAC AAACATATGT 
CCTAATAGAA TTATTCTCAT ACCGAAGCTG AATGAGATAC ATTTGATGAA ACCTCCTATA 
TTTACAGGAG ATGTTGATAT CATCAAGTTG AAGCAAGTGA TACAAAAGCA GCACATGTTC 
CTACCAGATA AAATAAGTTT AACCCAATAT GTAGAATTAT TCTTAAGTAA CAAAGCACTT 
AAATCTGGAT CTCACATCAA CTCTAATTTA ATATTAGTAC ATAAAATGTC TGATTATTTT 
CATAATGCTT ATATTTTAAG TACTAATTTA GCTGGACATT GGATTCTGAT TATTCAACTT 
ATGAAAGATT CAAAAGGTAT TTTTGAAAAA GATTGGGGAG AGGGGTACAT AACTGATCAT 
ATGTTCATTA ATTTGAATGT TTTCTTTAAT GCTTATAAGA CTTATTTGCT ATGTTTTCAT 
AAAGGTTATG GTAAAGCAAA ATTAGAATGT GATATGAACA CTTCAGATCT TCTTTGTGTT 
TTGGAGTTAA TAGACAGTAG CTACTGGAAA TCTATGTCTA AAGTTTTCCT AGAACAAAAA 
GTCATAAAAT ACATAGTCAA TCAAGACACA AGTTTGCGTA GAATAAAAGG CTGTCACAGT 
TTTAAGTTGT GGTTTTTAAA ACGCCTTGAT AATGCTAAAT TTACCGTATG CCCTTGGGTT 
GTTAACATAG ATTATCACCC AACACACATG AAAGCTATAT TATCTTACAT AGATTTAGTT 
AGAATGGGGT TAATAAATGT AGATAAATTA ACCATTAAAA ATAAAAACAA ATTCAATGAT 
GAATTTTACA CATCAAATCT CTTTTACATT AGTTATAACT TTTCAGACAA CACTCATTTG 
CTAACAAAAC AAATAAGAAT TGCTAATTCA GAATTAGAAG ATAATTATAA CAAACTATAT 
CACCCAACCC CAGAAACTTT AGAAAATATG TCATTAATTC CTGTTAAAAG TAATAATAGT 
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AACAAACCTA AATTTTGTAT AAGTGGAAAT ACCGAATCTA TGATGATGTC AACATTCTCT 
AGTAAAATGC ATATTAAATC TTCCACTGTT ACCACAAGAT TCAATTATAG CAAACAAGAC 
TTGTACAATT TATTTCCAAT TGTTGTGATA GACAAGATTA TAGATCATTC AGGTAATACA 
GCAAAATC T A ACCAACTTTA CACCACCACT TCACATCAGA CATCTTTAGT AAGGAATAGT 
GCATCACTTT ATTGCATGCT TCCTTGGCAT CATGTCAATA GATTTAACTT TGTATTTAGT 
TCCACAGGAT GCAAGATCAG TATAGAGTAT ATTTTAAAAG ATCTTAAGAT TAAGGACCCC 
AGTTGTATAG CATTCATAGG TGAAGGAGCT GGTAACTTAT TATTACGTAC GGTAGTAGAA 
CTTCATCCAG ACATAAGATA CATTTACAGA AGTTTAAAAG ATTGCAATGA TCATAGTTTA 
CCTATTGAAT TTCTAAGGTT ATACAACGGG CATATAAACA TAGATTATGG TGAGAATTTA 
ACCATTCCTG CTACAGATGC AACTAATAAC ATTCATTGGT CTTATTTACA TATAAAATTT 
GCAGAACCTA TTAGCATCTT TGTCTGCGAT GCTGAATTAC CTGTTACAGC CAATTGGAGT 
AAAATTATAA TTGAATGGAG TAAGCATGTA AGAAAGTGCA AGTACTGTTC TTCTGTAAAT 
AGATGCATTT TAATTGCAAA ATATCATGCT CAAGATGACA TTGATTTCAA ATTAGATAAC 
ATTACTATAT TAAAAACTTA CGTGTGCCTA GGTAGCAAGT TAAAAGGATC TGAAGTTTAC 
TTAATCCTTA CAATAGGCCC TGCAAATATA CTTCCTGTTT TTGATGTTGT ACAAAATGCT 
AAATTGATAC TTTCAAGAAC TAAAAATTTC ATTATGCCTA AAAAAACTGA CAAGGAATCT 
ATCGATGCAG TTATTAAAAG CTTAATACCT TTCCTTTGTT ACCCTATAAC AAAAAAAGGA 
ATTAAGACTT CATTGTCAAA ATTGAAGAGT GTAGTTAATG GAGATATATT ATCATATTCT 
ATAGCTGGAC GTAATGAAGT ATTCAGCAAC AAGCTTATAA ACCACAAGCA TATGAATATC 
CTAAAATGGC TAG AT CATGT TTTAAATTTT AGATCAGCTG AACTTAATTA CAATCATTTA 
TACATGATAG AGTCCACATA TCCTTACTTA AGTGAATTGT TAAATAGTTT AACAACCAAT 
GAGCTCAAGA AGCTGATTAA AATAACAGGT AGTGTGCTAT ACAACCTTCC CAACGAACAG 
TAGTTTAAAA TATCATTAAC AAGTTTGGTC AAATTTAGAT GCTAACACAT CATTATATTA 
TAGTTATTAA AAAATATACA AACTTTTCAA TAATTTAGCA TATTGATTCC AAAATTATCA 
TTTTAGTCTT AAGGGGTTAA ATAAAAGTCT AAAACTAACA ATTATACATG TGCATTCACA 
ACACAACGAG ACATTAGTTT TTGACACTTT TTTTCTCGT 



13740 

13800 

13860 

13920 

13980 

14040 

14100 

14160 

14220 

14280 

14340 

14400 

14460 

14520 

14580 

14640 

14700 

14760 

14820 

14880 

14940 

15000 

15060 

15120 

15180 

15219 
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(2) INFORMATION FOR SEQ ID NO: 34: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2166 amino acids 

(B) TYPE: amino acid 

(C) STRAND EDNESS : 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 34: 



Met Asp Pro lie lie Asn Gly Asn 
1 5 

Ser Tyr Leu Lys Gly Val lie Ser 
20 

Ser Tyr Leu Phe Asn Gly Pro Tyr 
35 40 

lie Ser Arg Gin Ser Pro Leu Leu 
50 55 



Ser Ala Asn Val Tyr Leu Thr Asp 
10 15 

Phe Ser Glu Cys Asn Ala Leu Gly 
25 30 

Leu Lys Asn Asp Tyr Thr Asn Leu 
45 

Glu His Met Asn Leu Lys Lys Leu 
60 



Thr lie Thr Gin Ser Leu lie Ser Arg Tyr His Lys Gly Glu Leu Lys 
65 70 75 80 

Leu Glu Glu Pro Thr Tyr Phe Gin Ser Leu Leu Met Thr Tyr Lys Ser 
85 90 95 

Met Ser Ser Ser Glu Gin lie Ala Thr Thr Asn Leu Leu Lys Lys lie 
100 105 110 

lie Arg Arg Ala lie Glu lie Ser Asp Val Lys Val Tyr Ala lie Leu 
115 120 125 

Asn Lys Leu Gly Leu Lys Glu Lys Asp Arg Val Lys Pro Asn Asn Asn 
130 135 140 

Ser Gly Asp Glu Asn Ser Val Leu Thr Thr lie lie Lys Asp Asp lie 
145 150 155 160 

Leu Ser Ala Val Glu Asn Asn Gin Ser Tyr Thr Asn Ser Asp Lys Ser 
165 170 175 

His Ser Val Asn Gin Asn He Thr He Lys Thr Thr Leu Leu Lys Lys 
180 185 190 
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Leu Met Cys Ser Met Gin His Pro Pro Ser Trp Leu lie His Trp Phe 
195 200 205 

Asn Leu Tyr Thr Lys Leu Asn Asn lie Leu Thr Gin Tyr Arg Ser Asn 
210 215 220 

Glu Val Lys Ser His Gly Phe He Leu He Asp Asn Gin Thr Leu Ser 
225 230 235 240 

Gly Phe Gin Phe He Leu Asn Gin Tyr Gly Cys He Val Tyr His Lys 
245 250 255 

Gly Leu Lys Lys He Thr Thr Thr Thr Tyr Asn Gin Phe Leu Thr Trp 
260 265 270 

Lys Asp He Ser Leu Ser Arg Leu Asn Val Cys Leu He Thr Trp He 
275 280 285 

Ser Asn Cys Leu Asn Thr Leu Asn Lys Ser Leu Gly Leu Arg Cys Gly 
290 295 300 

Phe Asn Asn Val Val Leu Ser Gin Leu Phe Leu Tyr Gly Asp Cys He 
305 310 315 320 

Leu Lys Leu Phe His Asn Glu Gly Phe Tyr He He Lys Glu Val Glu 
325 330 335 

Gly Phe He Met Ser Leu He Leu Asn He Thr Glu Glu Asp Gin Phe 
340 345 350 

Arg Lys Arg Phe Tyr Asn Ser Met Leu Asn Asn He Thr Asp Ala Ala 
355 360 365 

He Lys Ala Gin Lys Asp Leu Leu Ser Arg Val Cys His Thr Leu Leu 
370 375 380 

Asp Lys Thr Val Ser Asp Asn He He Asn Gly Lys Trp He He Leu 
385 390 395 400 

Leu Ser Lys Phe Leu Lys Leu He Lys Leu Ala Gly Asp Asn Asn Leu 
405 410 415 

Asn Asn Leu Ser Glu Leu Tyr Phe Leu Phe Arg He Phe Gly His Pro 
420 425 430 

Met Val Asp Glu Arg Gin Ala Met Asp Ser Val Arg He Asn Cys Asn 
435 440 445 

Glu Thr Lys Phe Tyr Leu Leu Ser Ser Leu Ser Thr Leu Arg Gly Ala 
450 455 460 

Phe He Tyr Arg He He Lys Gly Phe Val Asn Thr Tyr Asn Arg Trp 
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465 470 475 480 

Pro Thr Leu Arg Asn Ala lie Val Leu Pro Leu Arg Trp Leu Asn Tyr 
485 490 495 

Tyr Lys Leu Asn Thr Tyr Pro Ser Leu Leu Glu lie Thr Glu Asn Asp 
500 505 510 

Leu lie lie Leu Ser Gly Leu Arg Phe Tyr Arg Glu Phe His Leu Pro 
515 520 525 

Lys Lys Val Asp Leu Glu Met lie He Asn Asp Lys Ala He Ser Pro 
530 535 540 

Pro Lys Asp Leu He Trp Thr Ser Phe Pro Arg Asn Tyr Met Pro Ser 
545 550 555 560 

His He Gin Asn Tyr He Glu His Glu Lys Leu Lys Phe Ser Glu Ser 
565 570 575 

Asp Arg Ser Arg Arg Val Leu Glu Tyr Tyr Leu Arg Asp Asn Lys Phe 
580 585 590 

Asn Glu Cys Asp Leu Tyr Asn Cys Val Val Asn Gin Ser Tyr Leu Asn 
595 600 605 

Asn Ser Asn His Val Val Ser Leu Thr Gly Lys Glu Arg Glu Leu Ser 
610 615 620 

Val Gly Arg Met Phe Ala Met Gin Pro Gly Met Phe Arg Gin He Gin 
625 630 635 640 

He Leu Ala Glu Lys Met He Ala Glu Asn He Leu Gin Phe Phe Pro 
645 650 655 

Glu Ser Leu Thr Arg Tyr Gly Asp Leu Glu Leu Gin Lys He Leu Glu 
660 665 670 

Leu Lys Ala Gly He Ser Asn Lys Ser Asn Arg Tyr Asn Asp Asn Tyr 
675 680 685 

Asn Asn Tyr He Ser Lys Cys Ser He He Thr Asp Leu Ser Lys Phe 
690 695 700 

Asn Gin Ala Phe Arg Tyr Glu Thr Ser Cys He Cys Ser Asp Val Leu 
705 710 715 720 

Asp Glu Leu His Gly Val Gin Ser Leu Phe Ser Trp Leu His Leu Thr 
725 730 735 

He Pro Leu Val Thr He He Cys Thr Tyr Arg His Ala Pro Pro Phe 
740 745 750 



SUBSTITUTE SHEET (RULE 26) 



WO 98/13501 PCT/US97/16718 



- 396 - 



lie Lys Asp His Val Val Asn Leu Asn Glu Val Asp Glu Gin Ser Gly 
755 760 765 

Leu Tyr Arg Tyr His Met Gly Gly He Glu Gly Trp Cys Gin Lys Leu 
770 775 780 

Trp Thr He Glu Ala He Ser Leu Leu Asp Leu He Ser Leu Lys Gly 
785 790 795 800 

Lys Phe Ser He Thr Ala Leu He Asn Gly Asp Asn Gin Ser He Asp 
805 810 815 

He Ser Lys Pro Val Arg Leu He Glu Gly Gin Thr His Ala Gin Ala 
820 825 830 

Asp Tyr Leu Leu Ala Leu Asn Ser Leu Lys Leu Leu Tyr Lys Glu Tyr 
835 840 845 

Ala Gly He Gly His Lys Leu Lys Gly Thr Glu Thr Tyr He Ser Arg 
850 855 860 

Asp Met Gin Phe Met Ser Lys Thr He Gin His Asn Gly Val Tyr Tyr 
865 870 875 880 

Pro Ala Ser He Lys Lys Val Leu Arg Val Gly Pro Trp He Asn Thr 
885 890 895 

He Leu Asp Asp Phe Lys Val Ser Leu Glu Ser He Gly Ser Leu Thr 
900 905 910 

Gin Glu Leu Glu Tyr Arg Gly Glu Ser Leu Leu Cys Ser Leu He Phe 
915 920 925 

Arg Asn He Trp Leu Tyr Asn Gin He Ala Leu Gin Leu Arg Asn His 
930 935 940 

Ala Leu Cys Asn Asn Lys Leu Tyr Leu Asp He Leu Lys Val Leu Lys 
945 950 955 960 

His Leu Lys Thr Phe Phe Asn Leu Asp Ser He Asp Met Ala Leu Ser 
965 970 975 

Leu Tyr Met Asn Leu Pro Met Leu Phe Gly Gly Gly Asp Pro Asn Leu 
980 985 990 

Leu Tyr Arg Ser Phe Tyr Arg Arg Thr Pro Asp Phe Leu Thr Glu Ala 
995 1000 1005 

He Val His Ser Val Phe Val Leu Ser Tyr Tyr Thr Gly His Asp Leu 
1010 1015 1020 
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Gin Asp Lye Leu Gin Asp Leu Pro Asp Asp Arg Leu Asn Lys Phe Leu 
1025 1030 1035 1040 

Thr Cys Val lie Thr Phe Asp Lys Asn Pro Asn Ala Glu Phe Val Thr 
1045 1050 1055 

Leu Met Arg Asp Pro Gin Ala Leu Gly Ser Glu Arg Gin Ala Lys lie 
1060 1065 1070 

Thr Ser Glu lie Asn Arg Leu Ala Val Thr Glu Val Leu Ser He Ala 
1075 1080 1085 

Pro Asn Lys He Phe Ser Lys Ser Ala Gin His Tyr Thr Thr Thr Glu 
1090 1095 1100 

He Asp Leu Asn Asp He Met Gin Asn He Glu Pro Thr Tyr Pro His 
1105 1110 1115 1120 

Gly Leu Arg Val Val Tyr Glu Ser Leu Pro Phe Tyr Lys Ala Glu Lys 
1125 1130 1135 

He Val Asn Leu He Ser Gly Thr Lys Ser He Thr Asn He Leu Glu 
1140 1145 1150 

Lys Thr Ser Ala He Asp Thr Thr Asp He Asn Arg Ala Thr Asp Met 
1155 1160 1165 

Met Arg Lys Asn He Thr Leu Leu He Arg He Leu Pro Leu Asp Cys 
1170 1175 1180 

Asn Lys Asp Lys Arg Glu Leu Leu Ser Leu Glu Asn Leu Ser He Thr 
1185 1190 1195 1200 

Glu Leu Ser Lys Tyr Val Arg Glu Arg Ser Trp Ser Leu Ser Asn He 
1205 1210 1215 

Val Gly Val Thr Ser Pro Ser He Met Phe Thr Met Asp He Lys Tyr 
1220 1225 1230 

Thr Thr Ser Thr He Ala Ser Gly He He He Glu Lys Tyr Asn Val 
1235 1240 1245 



Asn Ser Leu Thr Arg Gly Glu Arg 
1250 1255 

Ser Ser Thr Gin Glu Lys Lys Thr 
1265 1270 

Leu Thr Lys Lys Gin Arg Asp Gin 
1285 

Trp Val Tyr Ala Ser He Asp Asn 



Gly Pro Thr Lys Pro Trp Val Gly 
1260 

Met Pro Val Tyr Asn Arg Gin Val 
1275 1280 

He Asp Leu Leu Ala Lys Leu Asp 
1290 1295 

Lys Asp Glu Phe Met Glu Glu Leu 
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1300 1305 1310 

Ser Thr Gly Thr Leu Gly Leu Ser Tyr Glu Lys Ala Lys Lys Leu Phe 
1315 1320 1325 

Pro Gin Tyr Leu Ser Val Asn Tyr Leu His Arg Leu Thr Val Ser Ser 
1330 1335 1340 

Arg Pro Cys Glu Phe Pro Ala Ser He Pro Ala Tyr Arg Thr Thr Asn 
1345 1350 1355 1360 

Tyr His Phe Asp Thr Ser Pro He Asn His Val Leu Thr Glu Lys Tyr 
1365 1370 1375 

Gly Asp Glu Asp He Asp He Val Phe Gin Asn Cys He Ser Phe Gly 
1380 1385 1390 

Leu Ser Leu Met Ser Val Val Glu Gin Phe Thr Asn He Cys Pro Asn 
1395 1400 1405 

Arg He He Leu He Pro Lys Leu Asn Glu He His Leu Met Lys Pro 
1410 1415 1420 

Pro He Phe Thr Gly Asp Val Asp He He Lys Leu Lys Gin Val He 
1425 1430 1435 1440 

Gin Lys Gin His Met Phe Leu Pro Asp Lys He Ser Leu Thr Gin Tyr 
1445 1450 1455 

Val Glu Leu Phe Leu Ser Asn Lys Ala Leu Lys Ser Gly Ser His He 
1460 1465 1470 

Asn Ser Asn Leu He Leu Val His Lys Met Ser Asp Tyr Phe His Asn 
1475 1480 1485 

Ala Tyr He Leu Ser Thr Asn Leu Ala Gly His Trp He Leu He He 
1490 1495 1500 

Gin Leu Met Lys Asp Ser Lys Gly He Phe Glu Lys Asp Trp Gly Glu 
1505 1510 1515 1520 

Gly Tyr He Thr Asp His Met Phe He Asn Leu Asn Val Phe Phe Asn 
1525 1530 1535 

Ala Tyr Lys Thr Tyr Leu Leu Cys Phe His Lys Gly Tyr Gly Lys Ala 
1540 1545 1550 

Lys Leu Glu Cys Asp Met Asn Thr Ser Asp Leu Leu Cys Val Leu Glu 
1555 1560 1565 

Leu He Asp Ser Ser Tyr Trp Lys Ser Met Ser Lys Val Phe Leu Glu 
1570 1575 1580 
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Gin Lys Val He Lys Tyr He Val Asn Gin Asp Thr Ser Leu Arg Arg 
1585 1590 1595 1600 

He Lys Gly Cys His Ser Phe Lys Leu Trp Phe Leu Lys Arg Leu Asp 
1605 1610 1615 

Asn Ala Lys Phe Thr Val Cys Pro Trp Val Val Asn He Asp Tyr His 
1620 1625 1630 

Pro Thr His Met Lys Ala He Leu Ser Tyr He Asp Leu Val Arg Met 
1635 1640 1645 

Gly Leu He Asn Val Asp Lys Leu Thr He Lys Asn Lys Asn Lys Phe 
1650 1655 1660 

Asn Asp Glu Phe Tyr Thr Ser Asn Leu Phe Tyr He Ser Tyr Asn Phe 
1665 1670 1675 ^1680 

Ser Asp Asn Thr His Leu Leu Thr Lys Gin He Arg He Ala Asn Ser 
1685 1690 1695 

Glu Leu Glu Asp Asn Tyr Asn Lys Leu Tyr His Pro Thr Pro Glu Thr 
1700 1705 1710 

Leu Glu Asn Met Ser Leu He Pro Val Lys Ser Asn Asn Ser Asn Lys 
1715 1720 1725 

Pro Lys Phe Cys He Ser Gly Asn Thr Glu Ser Met Met Met Ser Thr 
1730 1735 1740 

Phe Ser Ser Lys Met His He Lys Ser Ser Thr Val Thr Thr Arg Phe 
1745 1750 1755 1760 

Asn Tyr Ser Lys Gin Asp Leu Tyr Asn Leu Phe Pro He Val Val He 
1765 1770 1775 

Asp Lys He He Asp His Ser Gly Asn Thr Ala Lys Ser Asn Gin Leu 
1780 1785 1790 

Tyr Thr Thr Thr Ser His Gin Thr Ser Leu Val Arg Asn Ser Ala Ser 
1795 1800 1805 

Leu Tyr Cys Met Leu Pro Trp His His Val Asn Arg Phe Asn Phe Val 
1810 1815 1820 

Phe Ser Ser Thr Gly Cys Lys He Ser He Glu Tyr He Leu Lys Asp 
1825 1830 1835 1840 

Leu Lys He Lys Asp Pro Ser Cys He Ala Phe He Gly Glu Gly Ala 
1845 1850 1855 
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Gly Asn Leu Leu Leu Arg Thr Val Val Glu Leu His Pro Asp lie Arg 
1860 1865 1870 

Tyr lie Tyr Arg Ser Leu Lys Asp Cys Asn Asp His Ser Leu Pro lie 
1875 1880 1885 

Glu Phe Leu Arg Leu Tyr Asn Gly His lie Asn lie Asp Tyr Gly Glu 
1890 1895 1900 

Asn Leu Thr lie Pro Ala Thr Asp Ala Thr Asn Asn lie His Trp Ser 
1905 1910 1915 1920 

Tyr Leu His lie Lys Phe Ala Glu Pro He Ser He Phe Val Cys Asp 
1925 1930 1935 

Ala Glu Leu Pro Val Thr Ala Asn Trp Ser Lys He He He Glu Trp 
1940 1945 1950 

Ser Lys His Val Arg Lys Cys Lys Tyr Cys Ser Ser Val Asn Arg Cys 
1955 1960 1965 

He Leu He Ala Lys Tyr His Ala Gin Asp Asp He Asp Phe Lys Leu 
1970 1975 1980 

Asp Asn He Thr He Leu Lys Thr Tyr Val Cys Leu Gly Ser Lys Leu 
1985 1990 1995 2000 

Lys Gly Ser Glu Val Tyr Leu He Leu Thr He Gly Pro Ala Asn He 
2005 2010 2015 

Leu Pro Val Phe Asp Val Val Gin Asn Ala Lys Leu He Leu Ser Arg 
2020 2025 2030 

Thr Lys Asn Phe He Met Pro Lye Lys Thr Asp Lys Glu Ser He Asp 
2035 2040 2045 

Ala Val He Lys Ser Leu He Pro Phe Leu Cys Tyr Pro He Thr Lys 
2050 2055 2060 

Lys Gly He Lys Thr Ser Leu Ser Lys Leu Lys Ser Val Val Asn Gly 
2065 2070 2075 2080 

Asp He Leu Ser Tyr Ser He Ala Gly Arg Asn Glu Val Phe Ser Asn 
2085 2090 2095 

Lys Leu He Asn His Lys His Met Asn He Leu Lys Trp Leu Asp His 
2100 2105 2110 

Val Leu Asn Phe Arg Ser Ala Glu Leu Asn Tyr Asn His Leu Tyr Met 
2115 2120 2125 

He Glu Ser Thr Tyr Pro Tyr Leu Ser Glu Leu Leu Asn Ser Leu Thr 
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2130 



2135 



2140 



Thr Asn Glu Leu Lys Lys Leu lie Lys He Thr Gly Ser Val Leu Tyr 
2145 2150 2155 2160 



Asn Leu Pro Asn Glu Gin 
2165 



(2) INFORMATION FOR SEQ ID NO: 35: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 35: 
CATATCACTC ACTCTGGGAT GGAG 24 
(2) INFORMATION FOR SEQ ID NO:36: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 36: 
TCAGAACATC AAGCACCGCC 20 
(2) INFORMATION FOR SEQ ID NO: 37: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:37: 



ACAGTCAAGA CTGAGATGAG 



20 



(2) INFORMATION FOR SEQ ID NO: 38: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 38: 
AAGAGTCAGA TACATGTGGA 20 
(2) INFORMATION FOR SEQ ID NO: 39: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 39: 
ACATGAATCA GCCTAAAGTC 20 
(2) INFORMATION FOR SEQ ID NO: 40: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 40: 



CCGAAAGAGT TCCTGCGTTA CGACC 



25 



(2) INFORMATION FOR SEQ ID NO: 41: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 41: 
CAGTCCACAC AAGTACCAGG 20 
(2) INFORMATION FOR SEQ ID NO: 42: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 42: 
GTCAGAAGCT GTGGACCATC 20 
(2) INFORMATION FOR SEQ ID NO: 43: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:43: 



AATATTGCTA C AACAAT GG C 



20 



(2) INFORMATION FOR SEQ ID NO:44: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 44: 
ACTCTTCATT CCTAGACTGG 20 
(2) INFORMATION FOR SEQ ID NO: 45: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 45: 
GTCCAATTAT GACTATGAAC 20 
(2) INFORMATION FOR SEQ ID NO: 46: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 46: 
A6AACAGACA TGAAGCTTGC 
(2) INFORMATION FOR SEQ ID NO: 47: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:47: 
CCAACAAGGA ATGCTTCTAG 20 
(2) INFORMATION FOR SEQ ID NO: 48: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 48: 
ACAGCACTAT CTATGATTGA CCTGG 25 
(2) INFORMATION FOR SEQ ID NO: 49: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:49: 



SUBSTITUTE SHEET (RULE 26) 





WO 98/13501 



PCT/US97/16718 



- 406 - 



GCAACATGGT TTACACATGC 



20 



(2) INFORMATION FOR SEQ ID NO: 50: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 50: 
AGATTGAGAG TTGATCCAGG 20 
(2) INFORMATION FOR SEQ ID NO: 51: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 51: 
AGGAGATACT TAAACTAAGC 20 
(2) INFORMATION FOR SEQ ID NO: 52: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 52: 
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TAAGCTTATG CCTTTCA6C6 



20 



(2) INFORMATION FOR SBQ ID NO: 53: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:53: 
TTAACGGACC TAAGCTGTGC 20 
(2) INFORMATION FOR SEQ ID NO:54: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE : nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:54: 
GAAACAGATT ATTATGACGG 20 
(2) INFORMATION FOR SEQ ID NO: 55: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 55: 
CGGGCTATCT AGGTGAACTT CAGG 24 
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(2) INFORMATION FOR SEQ ID NO: 56: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND ED NESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 56: 
ATTTGGATAT GGAATATGAG 20 
(2) INFORMATION FOR SEQ ID NO: 57: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 57: 
ACTCAACTGA ACTACCAGTG 20 
(2) INFORMATION FOR SEQ ID NO: 58: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:58: 
AAGAACATCA TGTATTTCAG 20 
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(2) INFORMATION FOR SEQ ID NO: 59: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 59: 
TTATCAACGC ACTGCTCATG 20 
(2) INFORMATION FOR SEQ ID NO: 60: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 60: 
ATTTTCAGCA ATCACTTGGC ATGCC 25 
(2) INFORMATION FOR SEQ ID NO: 61: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 61: 
GCCTCTGTGC AAACAAGCTG 20 
(2) INFORMATION FOR SEQ ID NO: 62: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO;62: 
TCTCTAGTTA CTCTAGCAGC 
(2) INFORMATION FOR SEQ ID NO: 63: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 63: 
AGGTCGTTGT TTGTGAGGAG 20 
(2) INFORMATION FOR SEQ ID NO: 64: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 64: 
TCGTCCTCTT CTTTACTGTC 20 
(2) INFORMATION FOR SEQ ID NO: 65: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 65: 
CCGTCCTCGA GCTAGCCTCG 20 
(2) INFORMATION FOR SEQ ID NO: 66: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 66: 
CTCCTCCAGG CTCACATTGG 20 
(2) INFORMATION FOR SEQ ID NO: 67: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 67: 
GGGTTGGTAC ATAGCTCTGC 20 
(2) INFORMATION FOR SEQ ID NO: 68: 
(i) SEQUENCE CHARACTERISTICS: 
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(A) LENGTH: 25 base pairs 

(B) TYPE : nucleic acid 

(C) STRAND EDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 68: 
CACCCATCTG ATATTTCCCT GATGG 
(2) INFORMATION FOR SEQ ID NO: 69: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 69: 
TGGTTGACAG TACAAATCTG 20 
(2) INFORMATION FOR SEQ ID NO: 70: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 70: 

CTGAAATGGG AAGATTGTGC 20 

(2) INFORMATION FOR SEQ ID NO: 71: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 20 base pairs 
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(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 71: 
AGCAATCTAC ACTGCCTACC 20 
(2) INFORMATION FOR SEQ ID NO: 72: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:72: 
TCACAGATGA TTCAATTATC 20 
(2) INFORMATION FOR SEQ ID NO: 73: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:73: 
GATCCTAGAT ATAAGTTCTC 
(2) INFORMATION FOR SEQ ID NO: 74: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 
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(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 74: 



ACCAAACAAA GTT6G6TAAG G 



21 



(2) INFORMATION FOR SEQ ID NO: 75: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 32 base pairs 

(B) TYPE : nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 75: 
GGGGGATCCA TCCCTAATCC TGCTCTTGTC CC 32 
(2) INFORMATION FOR SEQ ID NO: 76: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 76: 
GATTCCTCTG ATGGCTCCAC 20 
(2) INFORMATION FOR SEQ ID NO: 77: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 



(C) STRANDEDNESS: single 
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(D) TOPOLOGY : linear 



(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 77: 



TAACA6TCAA 6GAGACCAAA G 



21 



(2) INFORMATION FOR SEQ ID NO: 78: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 32 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 78: 
GGGAAGCTTA ACCCTAATCC TGCCCTAGGT GG 32 
(2) INFORMATION FOR SEQ ID NO: 79: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE : nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: RNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 79: 
ACCAGACAAA GCTGGGAATA GA 22 
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What is claimed is: 

1. An isolated, recombinantly-generated, 
attenuated, nonsegmented, negative -sense, single 
stranded RNA virus of the Order Mononegavirales having 
at least one attenuating mutation in the 3 1 genomic 
promoter region and having at least one attenuating 
mutation in the RNA polymerase gene. 

2. The virus of Claim 1 wherein the virus 
is from the Family Paramyxoviridae . 

3 . The virus of- Claim 2 wherein the virus 
is from the Subfamily Paramyxovir inae . 

4. The virus of Claim 3 wherein the virus 
is from the Genus Morblllivirua . 

5. The virus of Claim 4 wherein the virus 
is measles virus. 

6. The measles virus of Claim 5 wherein; 

(a) the at least one attenuating mutation in 
the 3 1 genomic promoter region is 
selected from the group consisting of 
nucleotide 2 6 (A -» T) , nucleotide 42 (A 



-» T or A — » C) and nucleotide 96 (G -> 
A) , where these nucleotides are 
presented in positive strand, 
antigenomic, message sense; and 
(b) the at least one attenuating mutation in 
the RNA polymerase gene is selected from 
the group consisting of nucleotide 
changes which produce changes in an 
amino acid selected from the group 
consisting of residues 331 (isoleucine 
-> threonine) , 1409 (alanine -> 
threonine) , 1624 (threonine -> alanine) , 
1649 (arginine methionine) , 1717 
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(aspartic acid -> alanine) , 1936 
(histidine tyrosine), 2074 
(glutamine -» arginine) and 2114 
(arginine -> lysine) . 
7 . The virus of Claim 3 wherein the virus 



8. The virus of Claim 7 wherein the virus 



is human parainf luenzae virus type 3 (PIV-3) . 



9. The PIV-3 of Claim 8 wherein: 

(a) the at least one attenuating mutation in 
the 3 1 genomic promoter region is 
selected from the group consisting of 
nucleotide 23 (T -> C) , nucleotide 24 (C 
-> T) , nucleotide 28 (G — > T) and 
nucleotide 45 (T -» A) , where these 
nucleotides are presented in positive 
strand, antigenomic, message sense; and 

(b) the at least one attenuating mutation in 
the RNA polymerase gene is selected from 
the group consisting of nucleotide 
changes which produce changes in an 
amino acid selected from the group 
consisting of residues 942 (tyrosine -> 
histidine), 992 (leucine -» 
phenylalanine) , 1292 (leucine 
phenylalanine), and 1558 (threonine -> 
isoleucine) . 

10. The virus of Claim 3 wherein the virus 



11. The virus of Claim 2 wherein the virus 



12. The virus of Claim 11 wherein the virus 



is from the Genus Par amyxo virus . 



is from the Genus -RuJbuIavirus . 



is from the Subfamily Pneumovirinae • 



is from the Genus Pneumovirua. 
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13. The virus of Claim 12 wherein the virus 
is human respiratory syncytial virus (RSV) subgroup B. 

14. The virus of Claim 13 wherein: 

(a) the at least one attenuating mutation in 
the 3 1 genomic promoter region is 
selected from the group consisting of 
nucleotide 4 (C — > G) and the insertion 
of an additional A in the stretch of A's 
at nucleotides 6-11, where these 
nucleotides are presented in positive 
strand, antigenomic, message sense; and 

(b) the at least one attenuating mutation in 
the RNA polymerase gene is selected from 
the group consisting of nucleotide 
changes which produce changes in an 
amino acid selected from the group 
consisting of residues 353 (arginine -> 
lysine), 451 (lysine -> arginine), 1229 
(aspartic acid -> asparagine) , 2029 
(threonine -> isoleucine) and 2050 
(asparagine -» aspartic acid) . 

15. The virus of Claim 1 wherein the virus 
is from the Family Rhabdoviridae. 

16. The virus of Claim 1 wherein the virus 
is from the Family Filoviridae. 

17. A vaccine comprising an isolated, 
recombinantly-generated, attenuated, nonsegmented, 
negative -sense, single stranded RNA virus of the Order 
Mononegavirales according to Claim 1 and a 
physiologically acceptable carrier. 

18. The vaccine of Claim 17 comprising a 
measles virus according to Claim 5 and a 
physiologically acceptable carrier. 
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19. The vaccine of Claim 18 comprising a 
measles virus according to Claim 6 and a 
physiologically acceptable carrier. 

20. The vaccine of Claim 17 comprising a 
PIV-3 according to Claim 8 and a physiologically 
acceptable carrier . 

21. The vaccine of Claim 20 comprising a 
PIV-3 according to Claim 9 and a physiologically 
acceptable carrier. 

22. The vaccine of Claim 17 comprising an 
RSV subgroup B according to Claim 13 and a 
physiologically acceptable carrier. 

23. The vaccine of Claim 22 comprising an 
RSV subgroup B according to Claim 14 and a 
physiologically acceptable carrier. 

24. A method for immunizing an individual to 
induce protection against a nonsegmented, negative- 
sense, single stranded RNA virus of the Order 
Mononegavirales which comprises administering to the 
individual the vaccine of Claim 17. 

25. The method of Claim 24 wherein the 
vaccine is the vaccine of Claim 18. 

26. The method of Claim 25 wherein the 
vaccine is the vaccine of Claim 19. 

27. The method of Claim 24 wherein the 
vaccine is the vaccine of Claim 20. 

28. The method of Claim 27 wherein the 
vaccine is the vaccine of Claim 21. 

29. The method of Claim 24 wherein the 
vaccine is the vaccine of Claim 22. 

30. The method of Claim 29 wherein the 
vaccine is the vaccine of Claim 23. 
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31. An isolated nucleic acid molecule 



comprising a measles virus sequence in positive strand, 
antigenomic message sense selected from the group 
consisting of 1977 wild-type strain (SEQ ID NO:3), 1983 
wild- type strain (SEQ ID NO; 5) where the nucleotide 
2499 is G or C, Montefiore wild- type strain (SEQ ID 
NO:7), Rubeovax™ vaccine strain (SEQ ID NO:9), where 
the nucleotide 2143 is T or C, Moraten vaccine strain 
(SEQ ID NO:ll), Schwarz vaccine strain (SEQ ID NO:ll), 
where the nucleotide 4917 is C and the nucleotide 4924 
is C, and Zagreb vaccine strain (SEQ ID NO:13), and the 
complementary genomic sequences thereof. 



comprising a PIV-3 sequence in positive strand, 
antigenomic message sense selected from the group 
consisting of cp45 vaccine strain grown in fetal rhesus 
lung cells (SEQ ID NO: 19) and cp45 vaccine strain grown 
in Vero cells (SEQ ID N0:21), and the complementary 
genomic sequences thereof. 



transcription vector comprising an isolated nucleic 
acid molecule encoding a genome or antigenome of a 
nonsegmented, negative -sense, single stranded RNA virus 
of the Order Mononegavirales having at least one 
attenuating mutation in the 3 1 genomic promoter region 
and having at least one attenuating mutation in the RNA 
polymerase gene, together with at least one expression 
vector which comprises at least one isolated nucleic 
acid molecule encoding the trans -acting proteins 
necessary for encapsidation, transcription and 
replication, whereby upon expression an infectious 
attenuated virus is produced. 



32. An isolated nucleic acid molecule 



33. A composition which comprises a 
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34. The composition of Claim 33 wherein the 
transcription vector comprises an isolated nucleic acid 
molecule which encodes a measles virus according to 
Claim 5 and the at least one expression vector 
comprises at least one isolated nucleic acid molecule 
encoding the trans -acting proteins N, P and L. 

35. The composition of Claim 34 wherein the 
transcription vector comprises an isolated nucleic acid 
molecule which encodes a measles virus according to 
Claim 6. 

36. The composition of Claim 33 wherein the 
transcription vector comprises an isolated nucleic acid 
molecule which encodes a PIV-3 according to Claim 8 and 
the at least one expression vector comprises at least 
one isolated nucleic acid molecule encoding the trans- 
acting proteins NP, P and L. 

37. The composition of Claim 3 6 wherein the 
transcription vector comprises an isolated nucleic acid 
molecule which encodes a PIV-3 according to Claim 9. 

38. The composition of Claim 33 wherein the 
transcription vector comprises an isolated nucleic acid 
molecule which encodes an RSV subgroup B according to 
Claim 13 and the at least one expression vector 
comprises at least one isolated nucleic acid molecule 
encoding the trans-acting proteins N, P, L and M2 . 

39. The composition of Claim 3 8 wherein the 
transcription vector comprises an isolated nucleic acid 
molecule which encodes an RSV subgroup B according to 
Claim 14. 

40. A method for producing infectious 
attenuated nonsegmented, negative- sense, single 
stranded RNA virus of the Order Mononegavirales which 
comprises transforming or transfecting host cells with 
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the at least two vectors of Claim 33 and culturing the 
host cells under conditions which permit the co- 



infectious attenuated virus. 

41. The method of Claim 40 wherein the virus 
is the measles virus of Claim 5. 

42. The method of Claim 41 wherein the virus 
is the measles virus of Claim 6. 

43. The method of Claim 40 wherein the virus 
is the PIV-3 of Claim 8. 

44. The method of Claim 43 wherein the virus 
is the PIV-3 of Claim 9. 

45. The method of Claim 40 wherein the virus 
is the RSV subgroup B of Claim 13 . 

46. The method of Claim 45 wherein the virus 
is the RSV subgroup B of Claim 14. 



expression of these vectors so as to produce the 
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